Journal of Applied Psychology 


Joun G. Dartey, Editor 
University oF MINNESOTA 


Lorraine Boutnuitet, Managing Editor 





Table of Contents 


Differential Performance of Fleet and Recruit Personnel in Torpedoman’s Mates School: 
R. B. Allison, Jr. 


Differentiation of Successful and Unsuccessful Premedical Students: R. S. Melton 


A Comparison of Parametric and Nonparametric Analyses of Opinion Data: H. Rosen and 
R. A. H. Rosen 


Brand Loyalty—Twelve Years Later: L. Guest 


Relation of Positive and Negative Sociometric Valuations to Social and Personal Adjustment 
of School Children: B. N. Phillips and M. V. DeVault 


Is Interest Maturity Related to Linguistic Development? M. D. Woolf and J. A. Woolf. . 


Technique of Problem Solving as a Predictor of Achievement in a Mechanics Course: E. L. 
Gaier. .. 416 


A National Answer to the Question, “Do Sons Follow Their Fathers’ Occupations?”: P. G. 
Jenson and W. K. Kirchner 


Population Stereotypes in Pedal Control of a “Ball-Bank” Indicator: B. R. Bugelski 


Response Preferences in Display-Control Relationships: S. Ross, B. E. Shepp, and T. G. 
Andrews 


—— Width, Illumination Level, and Figure-Ground Contrast in Numeral Visibility: R. S. 


The Peripheral Viewing of Dials: J. W. Senders, I. B. Webb, and C. A. Baker 
Relative Effectiveness of Two Standard Color-Vision Tests: G. L. De Nittis 
An Attempt at Validation of the Empathy Test: G. B. Bell and R. Stolper 
Prolonged Reading Tasks in Visual Research: M. A. Tinker 


The Influence of Color of Paper upon Scores Earned on Objective Achievement Examination: 
W. B. Michael and R. A. Jones 


Rater Reliability and “Judgmental Fatigue”: A. W. Bendig 
A Note on Alternative Methods for Estimating Factor Scores: D. K. Trites and S. B. Sells.. 455 


nere for Names and Faces: A Characteristic of Social Intelligence? W. A. Kaess and 
. Witryol 


Book Reviews 





American Psychological Association 


Volume 39, Number 6 | December, 1955 





Consulting Editors 


Harold E. Burtt, Ohio State University 
Alphonse Chapanis, Johns Hopkins Univer- 
sity 


Clifford E. Jurgensen, Minneapolis Gas 
Company 

Laurence S. McGaughran, University of 
Houston 


Quinn McNemar, Stanford University 

Alexander Mintz, City College of New York 

Harold F. Rothe, Fairbanks, Morse and 
Company 

Julian B. Rotter, Ohio State University 

Donald E. Super, Columbia University 

Miles A. Tinker, University of Minnesota 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search. 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 

Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
scripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 

Manuscripts must conform to the style require- 
ments described in the “Publication Manual of the 
American Psychological Association,” Psychol. Bull., 


1952, 49, No. 4, Part 2. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$7.00 per volume 


Subscriptions, orders, and business communications should be addressed to the 


$1.50 per issue 
Association, 


American Psychological 
1333 Sixteenth St. N.W., Washington 6, D. C. Address changes must reach the subscription office by the 25th of 
the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 
subscribers should notify the post office that they will guarantee second-class forwarding postage. Other claims for 
undelivered copies must be made within four months of publication. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 


Acceptance 
of 1948, authorized 


- 


ES geet a ft anand tes peek Oe. Section 34.40, P. L. & R. 


~ Cagyeisht © 1965 by the American Paychslagiea! Auncletion, Inc. 





Journal of Applied Psychology 








VoL. 39, No. 6 


DECEMBER, 1955 








Differential Performance of Fleet and Recruit Personnel in 
Torpedoman’s Mates School * 


Roger B. Allison, Jr. 


Educational Testing Service 


Frequently candidates for industrial jobs 
or schools are drawn from several different 
sources. An important problem that arises is 
whether a different selection procedure should 
be used for candidates from each source, or 
whether the characteristics of these groups of 
individuals are sufficiently similar to make a 
single selection procedure appropriate. For 
example, in the Navy this type of problem 
occurs when students for training schools are 
selected either from the fleet or from recruit 
training centers, with such selectees obviously 
differing with respect to their experience in 
the Navy. For some Navy schools, to be 
selected from recruit training centers a re- 
cruit must have test scores at or above es- 
tablished critical scores, whereas for students 
from the fleet this requirement may be waived 
in some instances. 

This study was undertaken to determine 
whether the regression of school success upon 
predictor variables was the same for recruit 
and fleet students. If the two groups of sub- 
jects performed differently in school relative 
to ability, then separate selection procedures 
would seem advisable. 


Method 
A. The Subjects 


The data upon which this study was based were 
secured in connection with another study (1) con- 
ducted during the last six months of 1952 at a Class 
“A” Torpedoman’s Mates School. A total of 276 
students from six consecutive classes at the school 
participated in the study. Ninety-five of these stu- 
dents had been selected from the fleet, whereas the 
remaining 181 had been selected from recruit train- 
ing centers. ‘tong 


1 This study was performed under Navy Contract 
Nonr-694(00) between the Office of Naval Research 
and Educational Testing Service. The opinions ex- 
pressed are those of the author and do not neces- 
sarily represent opinions of the Office of Naval Re- 
search or the Bureau of Naval Personnel. Thanks 
are due to Dr. William G. Mollenkopf, Principal In- 
vestigator under the contract, and to Dr. Norman 
Frederiksen, Director of General Research at ETS. 


Students for the Torpedoman’s Mates School were 
selected primarily on the basis of scores on the Me- 
chanical Test, a score of 55 or better being required 
for selection. This score is exceeded by approxi- 
mately 31 per cent of all Naval enlisted personnel. 
About 2 per cent of the recruit subjects in the study 
were found to have scores below this point. How- 
ever, in the case of fleet personnel (for whom this 
requirement was often waived for various reasons), 
approximately one-half of the subjects were found 
to have scores below the cutoff point. 


B. The Predictor Variables 


1. General Classification Test (GCT). Verbal abil- 
ity as reflected in analogies and sentence com- 
pletion items. 

. Arithmetic Test (ARI). Ability to perform 
routine computations and to solve quantitative 
problems. 

. Mechanical Test (MECH). Mechanical and 
electrical knowledge and mechanical compre- 
hension. 

. Clerical Aptitude Test (CA). A speeded test 
involving name and number checking. 

. Block Counting Test (SPA). Test of spatial 
ability. 


The first four predictors currently constitute the 
Navy Basic Test Battery and are administered to 
incoming recruits. Their means and standard de- 
viations are set at 50 and 10, respectively, for the 
population of all enlisted personnel. The spatial test 
was not so scaled and scores were expressed as num- 
ber right. 


6. Age expressed in years at time of experimental 
testing. 


7. Education expressed in years of attendance. 


C. The Criterion Variables 


Two criteria of success in the Torpedoman’s Mates 
School were obtained. 

1. Average grade on weekly performance tests 
(AWPT). This score was the average score on nine 
or more of the eleven weekly performance tests ad- 
ministered during the training course. These per- 
formance tests were based upon such topics as main 
engines, gyros, basic electricity, Mark 27 torpedo, 
and so on. 

2. Final grade in course. This grade was a com- 
posite of the average weekly grade (60%), the grade 
earned on the final written test (20%), and the 
grade earned on the final identification test (20%). 


393 





394 


The average weekly grade had been obtained by 
weighting the average grade on the weekly perform- 
ance tests 60%, the average grade on weekly writ- 
ten tests 20%, and the average grade on weekly 
identification tests 20%. These percentage weights 
do not take into consideration the intercorrelations 
or the variabilities of the measures involved, and 
thus do not necessarily represent the effective weights 
of these variables. 


D. Statistical Technique 


One way of comparing the scholastic achievement 
of these two groups would be to determine whether 
the differences in mean criterion measures were sta- 
tistically significant. This would represent an ab- 
solute comparison between the groups and it is pos- 
sible that whatever differences may exist might only 
reflect differences in abilities between the groups 
which existed before entering the school. Since the 
question being raised is whether separate prediction 
formulas should be used, a relative comparison should 
be made in which ability is controlled statistically. 
The analysis of covariance developed by Gulliksen 
and Wilks (3) was employed to determine whether 
fleet and recruit personnel having similar scores on 
predictor tests perform similarly or differently with 
respect to school grades.* 

More specifically, the Gulliksen-Wilks method per- 
mits three hypotheses to be tested: Ha, that the 
standard errors of estimating the criterion from a 
specified predictor(s) are equal for fleet subjects and 
recruit subjects; Hs, that the slopes of the regres- 
‘sion lines or_planes—regression of criterion on pre- 
dictor(s)—are the same for the two groups; and He, 
that the criterion intercepts of the regression lines or 
planes are the same for the two groups. Hp as- 
sumed that H, was supported; He, in turn, assumed 
that Hs was supported. If He was not rejected, 
then it is legitimate to consider that the same pre- 
diction formula could be used with both groups. If 
He was rejected, and H, and He supported, then the 
two groups differ by a constant amount which would 
be incorporated in the prediction formulas. If either 
H, or He was rejected, then separate prediction for- 
mulas are warranted. 


Results and Discussion 


In order to facilitate presentation of the re- 
sults and appropriate discussion, this section 
is divided into three parts: (a) the relation- 
ship between the criteria and selected pre- 
dictor variables, (b) the influence of age and 
education on the relations between tests and 
criterion measures, and (c) factors associated 
with the better performance of fleet person- 
nel. The nucleus for these parts is a table,* 
which contains the intercorrelations of the 

2 The computational procedure was that developed 
by Ledyard R Tucker, appearing in the appendix of 
Adjustment to college by N. Frederiksen and W. B. 
Schrader. Princeton, N. J.: Educational Testing 
Service, 1951. 

3 This table will be supplied by the author on re- 
quest. 


Roger B. Allison, Jr. 


predictor, experimental, and criterion vari- 
ables, together with their means and stand- 
ard deviations. 

(a) Relationship between the criteria and 
selected predictor variables. The means and 
standard deviations for the recruit subjects 
and the fleet subjects were approximately the 
same on GCT, ARI, CA, and SPA. Recall 
that the recruit subjects were selected from 
individuals whose MECH scores were above 
55 while the fleet subjects had MECH scores 
extending below this critical point. Thus, the 
finding that the recruit subjects had a higher 
mean and a smaller standard deviation on this 
test was anticipated. 

Both final grades and AWPT had higher 
means and smaller standard deviations for 
the fleet subjects than for the recruit sub- 
jects. In order to make a relative compari- 
son of the scholastic achievement for the two 
groups, several analyses of covariance were 
undertaken in which final grades or AWPT 
were set as the criterion, and the regressions 
of the criterion on various combinations of 
predictor variables were studied. The prob- 
ability that the differences obtained between 
the recruit subjects and fleet subjects may 
have occurred as chance deviations from a 
true difference of zero was determined for 
each of the three hypotheses tested. In ad- 
dition to the levels of significance of the dif- 
ferences, the magnitudes of the differences 
were also determined for the cases in which 
H, and Hg were accepted. These magni- 
tudes, which represent the distances between 
the criterion intercepts, were expressed in 
terms of pooktdt standard error-of-estimate 
units and transformed into estimates of the 
percentage of fleet subjects excelling the av- 
erage recruit subject. The percentage thus 
estimates the advantage the fleet subjects had 
over the recruit subjects after differences in 
predictor scores have been taken into consid- 
eration. Table 1 summarizes these findings. 

From Table 1 it becomes rather apparent 
that the regressions of both of the criteria 
upon various predictor variables resulted in 
all cases in either a significant difference in 
the slopes (Hx) or in the criterion intercepts 
(H~) of the regression planes of the fleet sub- 
jects and the recruit subjects. It is also ap- 
parent that the fleet subjects had a definite 
advantage over the recruit subjects—in gen- 
eral, about 72 per cent of the fleet subjects 





Differential Performance of Fleet and Recruit Personnel 


Table 1 


Differences Between 


Grades Earned by Fleet and Recruit Personnel with Ability Held Constant 


Per Cent 
Fleet 
Advantage Exceeding 


Ha 


Criterion 


Final grades 
Final grades 
Final grades 
Final grades 
Final grades 
AWPT* 
AWPT 
AWPT 


Predictor (s) 


MECH 

GCT 

ARI and MECH 

GCT and MECH 

GCT, ARTI, MECH, CA, SPA 
MECH 

GCT 

ARI and MECH 


(errors of 
estimate) 


30> p> .20 
98> p> .95 
50> p>.30 
95> p> .90 
99> p> .98 
10> p>.05 
50> p> .30 
20> p> .10 


Hp 


(slopes) 


.20> p> .10 
O1>p 

.20> p> .10 
O1>p 

10> p>.05 
30> p>.20 
05> p>.02 


He 


(intercepts) 


O1>p 
(.01> p) 
01> p 
(.01>)) 
01>) 
O1>p 
O1>p 


in Cest 
Units 
.63 
(Hx rejected) 
Ji 
(Hp rejected) 
.62 
.63 
61 


Average 
Recruit 


74 


~ 
N 


AWPT GCT and MECH 
AWPT GCT, ARI, MECH, CA, SPA 


* Average grades on weekly performance tests. 


exceeded the average recruit subject in terms 
of grades earned in the school. (If there 
were no difference, 50 per cent of the fleet 
subjects would, of course, exceed the average 
recruit subject.) The analysis failed to dem- 
onstrate a significant difference between the 
standard errors of estimating the criterion 
from the predictor(s), although two of the 
p values suggest that the standard errors of 
estimate of the two groups were more alike 
than would normally be attributable to chance 
fluctuations. 

Essentially this means that if the predic- 
tion of final grades were based upon MECH 
scores only, then for a given score on MECH 
two predicted grades might be considered, 
corresponding to the two regression lines. If 
the subject under consideration came from a 
recruit training center, we would estimate his 
most likely final grade as the lower of the 
two possible grades. On the other hand, for 
a fleet subject we would use the higher re- 
gression line to obtain an estimate of his 
most likely school grade. We might note 
that were all predictions to be based upon 
the regression line for the fleet subjects, the 
recruits would tend to earn grades in school 
below their predicted grade, and hence would 
be classified as “underachievers,” whereas the 
converse would occur if we employed only 
the regression line for the recruit subjects. 

To summarize thus far, the results indicate 
that fleet subjects earned higher grades, both 
final grades and weekly performance grades, 
in « Torpedoman’s Mates School than did re- 
cruit subjects of similar ability as measured 
by tests from the Navy Basic Battery. These 
findings strongly favor separate selection pro- 


50> p> 


50> p> .30 


50> p> .30 
05> p>.02 
30> p>.20 


O1>p 57 
01>) .66 
O1>p 64 


30 


eum w ke w 


~Isy ss ss 


cedures for the two groups; if a cutting score 
is to be used, different critical scores should 
be used for the two groups. The findings also 
suggest that there were factors associated with 
fleet duty which led to better performance in 
this training school. 

(b) The influence of age and education on 
the relations between test and criterion meas- 
ures. Age and education were two factors 
which might be associated with the better 
performance of the fleet subjects. Analysis 
of these factors showed that both variables 
had low but significant correlations with final 
grades and AWPT.* 

The data show that the fleet subjects were 
somewhat older and their ages spread over a 
longer span than the recruits. With respect 
to education the two groups were about the 
same. The means and standard deviations of 
grades and also the correlations with school 
performance were essentially alike for the 
two groups. 

Age and education were combined with 
MECH scores and the multiple regressions 
of the criteria on these variables were ana- 
lyzed by the covariance method discussed 
earlier. The results are reported in Table 2 
and indicate that the fleet subjects earned 
higher grades than recruit subjects of similar 
ability and background on these variables. 
The essential difference between the regres- 
sion planes lies in their criterion intercepts— 
the distance between intercepts was .36 of a 
standard error-of-estimate unit. This dis- 
tance indicates that approximately 64 per 


4 These correlations are reported in the table avail- 
able from the author. See footnote 3. 





Roger B. Allison, Jr. 


Table 2 


Differences Between Grades Earned by Fleet and Recruit Personnel with Mechanical Ability, 
Age, and Education Held Constant 








Ha 
(errors of 
estimate) 


Criterion Predictor (s) 


Per Cent 
Fleet 
Advantage Exceeding 
iN Gest Average 
Units Recruit 


He 


B 
(slopes) (intercepts) 





50> p>.30 
10> p>.05 


MECH, Age, Educ. § 
MECH, Age, Educ. 


Final grades 
AWPT* 





05> p>.02 
05> p>.02 


37 té«é 
36 64 


50>p>.30— 
50> p>.30 





* Average grades on weekly performance tests. 


cent of the fleet subjects earned higher grades 
than the average recruit earned. 

(c) Factors associated with the better per- 
formance of fleet personnel. What factors 
then might have accounted for the better per- 
formance of subjects coming from the fleet? 
Although the study did establish that fleet 
personnel performed better in school than re- 
cruit personnel when controlled on a number 
of factors, it was obviously beyond the scope 
of the present study to identify the factors 
leading to the better performance of fleet per- 
sonnel. Yet it would seem extremely desir- 
able to undertake research to isolate the fac- 
tors associated with the better performance of 
the fleet subjects. Judging from the results 
obtained by Frederiksen and Schrader (2) in 
which they found a tendency for veterans to 
overachieve in college, we might expect to 
find that fleet subjects enrolled in most all 
types of Navy training schools will perform 
better relative to ability than recruit subjects. 
If so, the crucial factors contributing to bet- 
ter performance may have an influence upon 
training procedures and policies. Relevant 
factors might include motivation, transfer ef- 
fects from fleet experience, and adjustment 
to Navy life. 


Summary and Recommendations 


This study dealt with the general problem 
of whether or not separate selection pro- 
cedures (prediction formulas) should be uti- 
lized when applicants come from different 
sources. Numerous situations of this type 
may be found in industrial, educational, and 
military settings. The present study was 
concerned with the prediction of success in a 
Torpedoman’s Mates School, which draws its 


students either from the fleet or from recruit 
training centers. The primary purpose was 
to determine what influence this background 
difference had upon the relationships between 
test scores and measures of success in that 
school. The findings show that students from 
the fleet earned higher grades in the Torpedo- 
man’s Mates School than did recruit students 
when scores on tests from the Navy’s Basic 
Battery were taken into account. This dif- 
ference in background definitely influenced 
the relationships between test scores and per- 
formance in the school, and to such a degrée 
that separate selection procedures for the two 
groups appear advisable. The results of the 
study suggest that there were factors asso- 
ciated with fleet duty which contribute to 
better performance in the school. 

Further studies are recommended to deter- 
mine (a) whether fleet subjects perform better 
in other Navy schools than do recruit sub- 
jects of similar ability, and, if this is true, 
(b) what factors are operating to account for 
the better performance of fleet subjects, and 
(c) whether there might exist an optimal ex- 
posure to Navy life which would tend to 
maximize the benefits from Navy training. 


Received October 4, 1954. 


References 


1. Allison, R. B., Jr. Learning scores as predictors. 
Amer. Psychologist, 1954, 9, 320. (Abstract) 

2. Frederiksen, N., & Schrader, W. B. The aca- 
demic achievement of veteran and nonveteran 
students. Psychol. Monogr., 1952, 66, No. 15 
(Whole No. 347). 

3. Gulliksen, H., & Wilks, S. S. Regression tests 
for several samples. Psychometrika, 1950, 15, 
91-114. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Differentiation of Successful and Unsuccessful Premedical 
Students 


Richard S. Melton 


University of Minnesota? 


One of the most widely studied topics of 
educational psychology has been that of pre- 
dicting academic success. Literally hundreds 
of such studies have been reported, most of 
them dealing with correlations between previ- 
ous achievement, aptitude test scores, and 
some measure of success like grade-point av- 
erage or honor-point ratio. Few, however, 
have concerned themselves with more remote 
criteria, such as graduation from college, pro- 
fessional success, etc. Usually these latter 
criteria are the more significant, but they are 
difficult, of course, to predict with existing 
methods. 

One such criterion is acceptance by a pro- 
fessional college. If, for example, it could be 


predicted at the beginning or end of the fresh- 
man year in college whether a given prepro- 
fessional student had good or poor chances of 


being accepted by the college of his choice, a 
great deal of presently misguided effort could 
be rechanneled. (Nondirective counselors may 
disagree on this point, but it is an open ques- 
tion as to which kind of “direction” is worse: 
friendly advice from a freshman counselor or 
the impersonal rejection notice from the pro- 
fessional college.) ; 

Premedical students often present one of 
the most difficult problems, for in recent 
years the ratio of applicants to acceptances in 
American medical schools has run as high as 
ten to one (3). This ratio is spuriously high 
to the extent that it is a common practice to 
submit an application to more than one medi- 
cal school, but there is still a sizable num- 
ber of students who are not accepted by any 
school. 

The attrition at application time is not the 
only attrition, however, for many premedical 
aspirants never reach that stage. After a 
year or two of college many premeds sense 


1 Now at Crew Research Laboratory, Air Force 
Personnel and Training Research Center, Randolph 
‘Field, Texas. 


the futility of continuing, and either leave 
school or change their educational objectives. 
These students also present difficult prob- 
lems for the educational counselor. In view 
of the common frustrations of these students, 
it would appear highly desirable to be able 
to realistically advise these people of their 
chances of admission during the early part of 
their premedical curriculum. 

For these reasons, a study was made of a 
group of 102 male, nonveteran premedical 
freshman who entered the College of Science, 
Literature and the Arts of the University of 
Minnesota in the summer and fall of 1949. 
Their records were evaluated at three levels: 
at the end of the freshman year, at the be- 
ginning of the sophomore year, and at the 
time of admission to the Medical School. 


First-Year Success 


At the end of the freshman year, a number of in- 
tellective and nonintellective variables were related 
to the first available criterion of success as a pre- 
med, the honor-point ratio (HPR). As might be 
expected, high-school performance was the best pre- 
dictor (r= 58). The 1947 ACE test yielded an r 
of 41, and the Cooperative English Test correlated 
51. Combined into a multiple-regression equation, 
they yielded an R of .65. 

These correlations were considerably higher than 
the coefficients currently being found for the total 
SLA freshman class. In view of the selected sample 
and attendant restriction of range on the predictor 
variables, these results are all the more significant, 
for they lend support to the thesis of Wagner and 
Strabel (8) that homogeneous groupings may im- 
prove academic prediction considerably. 

A number of nonintellective variables including the 
Strong Vocational Interest Blank, socioeconomic and 
educational background, and personal history data 
were also tested for predictive utility. The Strong 
variables used (Physician scale, Occupational Level, 
Interest Maturity, and Group I patterning) showed 
no relationship to the criterion. In fact, the Phy- 
sician scale correlated — .03. This is discouraging 
evidence for proponents of that test, although one 
year of academic performance may be an inappro- 
priate validation criterion. Restriction of range is 
not the cause of the insignificant relationship either, 


397 





398 


for the standard scores ranged in a fairly normal 
fashion from 3 to 62. The correlations for OL and 
IM were not computed because chi-square tests 
showed they were not related to the criterion. Simi- 
larly, a chi-square test relating Group I pattern to 
HPR showed no significant results. 

Among the socioeconomic and educational history 
variables, no significant relationships were found. 
Students whose fathers were in trades or technical 
work tended to be slightly less successful than the 
sons of business, managerial, and professional men, 
and there was a small inverse relationship between 
the student’s stated certainty of his vocational choice 
(medicine) and success during the first year. Stu- 
dents whose homes were in the local area (Min- 
neapolis and St. Paul) and students from public 
schools tended to be slightly more successful than 
other groups, but none of these differences was sta- 
tistically significant. 

The most practical results of the study, for coun- 
selors and administrators, was an arithmetic average 
of high-school rank, ACE percentile, and Coopera- 
tive English pencentile (hereafter referred to as a 
Medical Aptitude Rating). Although such an aver- 
age is mathematically unjustified because of the in- 
equality of percentile units, such a procedure can be 
defended on practical grounds. Adopting an aver- 
age percentile of 80 as a cutting point, 42 students 
fell below it and 60 above it. Of the 42 below this 
cutting point only five (12 per cent) achieved a satis- 
factory honor-point ratio while 67 per cent of those 
above it were successful. (Satisfactory honor-point 
ratio was defined as 1.75 or better, roughly a B— 
grade, since this is an HPR below which few if any 
applicants are ever accepted by the Medical School. 
An HPR of 1.75 is a minimal definition of success.) 

From the viewpoint of the educational counselor, 
a cutting point of 80 would be useful in that it 
could be used at the beginning of the freshman year 
to identify those students who are not good pros- 
pects for the premedical curriculum. Identification 
of the successes is not so simple, however, for only 
two-thirds of those whose averages were above 80 
were successful. 


Richard S. Melton 


Sophomore Registration 


The second part of the study involved a check of 
the sophomore registration blanks of the sample. Of 
those 45 students who had an HPR = 1.75, 42 (93 
per cent) had continued in the premedical curricu- 
lum, while of the 57 students with HPR < 1.75 only 
21 (37 per cent) still indicated a premedical choice. 
Both first-year HPR and the Medical Aptitude Rat- 
ing were also significantly related to continuing in 
the curriculum. The Physician scale and the Group 
I pattern on the Strong showed no such relation- 
ships, however. 


Acceptance by Medical School 


The final criterion, acceptance by the Medical 
School, was available two years later. Complete 
data were available on 100 of the original 102 cases. 
Forty-five had been accepted, a second group of 32 
were still in an undergraduate college (most of them 
in different majors), and a third group of 23 were 
no longer enrolled in the university. It is possible 
that some of the latter group might have been ac- 
cepted by other medical schools, but this is unlikely 
since most of them were students whose first-year 
honor-point ratios were quite unsatisfactory. (The 
average was 0.75 and only one student had a fresh- 
man HPR over 1.50.) No evidence could be found 
that any of these individuals had been accepted by 
any other medical school. 

To determine whether these three groups could be 
differentiated, and hence predicted, the generalized 
distance function (D*) was used with the five vari- 
ables: high-school rank, ACE score, Cooperative 
English score, Physician scale of the Strong, and 
first year honor-point ratio. Using the method of 
pivotal condensation, the significance of each vari- 
able could be assessed as it was added. Thus, the 
first four variables could be used at the time of 
freshman matriculation, and first-year honor-point 
ratio could be added as a predictor at the end of the 
first year. 

The means and variances of the five variables are 
given in Table 1. The 45 who were accepted by the 


» | 


Table 1 


Means and Standard Deviations of the Three Groups of Premedical Students 








Group I 
Accepted by 
Medical School 
(N = 45) 


Variable Mean SD 


Group III 
No Longer 
Enrolled 
(N = 23) 


Mean 


Group II 


Changed Major 





35.98 
91.27 
86.80 
84.29 
205.64 


13.84 
10.73 
14.71 
11.69 
46.42 


Physician scale 

High school rank 

ACE score 

Cooperative English score 
First-Year HPR X 100 


40.30 
73.35 
68.91 
64.96 
74.70 


13.04 
18.13 
28.65. 
24.39 
45.65 











Differentiation of Premedical Students 


Table 2 


Discriminatory Ability of the Five Predictor Variables? 


Categories 
Compared 


Variable 


Physician ocall 

High school rank 

ACE score 

Cooperative English score 


First-Year HPR 





* Significant at the .05 level. 
** Significant at the .01 level. 
** Significant at the .001 level. 


Medical School had higher mean scores on all vari- 
ables except the Physician scale. On that scale they 
had a lower mean score than either of the other 
groups, and hence in a prediction equation the Phy- 
sician scale would be weighted negatively, if at all. 
To give it an optimal chance to discriminate in the 
D® analysis, it was entered as the first variable, but 
as Table 2 shows, it made no significant contribu- 
tion. Thus, the Physician scale seemingly has no 
utility, at the University of Minnesota, for the pre- 
diction of any of the criteria used in this investiga- 
tion. ; 

The variables that did add to the discrimination 
were high-school rank (p< .01), ACE, (p< .05), 
and first-year honor-point ratio (p< .001). That 
the latter contributed some unique variance in spite 
of its high correlations with HSR and ACE is evi- 
dent both in the size of the F ratios (after all the 
other variables had been added), and in the fact 
that it was the only variable which discriminated be- 
tween Categories II and III (those who remained in 
school with a different major and those who were 
no longer enrolled). 


2 To test the contribution of a variable to the dis- 
crimination between groups the value D*,., — D*, is 
computed. This gives the amount of increase in D* 
which that variable makes. D*, refers to the size of 
D* before that variable was added and D*,,., gives 
the size of D® after it has been added. The signifi- 
cance of the addition is found from a formula which 
can be used with the F table. The reader who is in- 
terested in the calculation of D® values is referred to 
references (1) and (2). Both give full computa- 
tional outlines. 


0.002254 
0.111223 
0.081808 


0.614247 
1.324227 
0.183605 


0.910856 
1.672921 
0.185711 
1.006308 


1.804935 
0.188669 








0.002254 
0.111223 
0.081808 


0.614247 
1.324227 
0.183605 


0.910856 
1.672921 
0.185711 


1.006308 
1.804935 
0.188669 


1.951203 
7.290779 
2.065997 





0.002254 
0.111223 
0.081808 


0.611903 
1.213004 
0.101797 


0.296609 
0.348694 
0.002106 


0.095452 
0.132014 
0.002958 


0.944895 
5.485844 
1.877328 


SN - 
1.333 


4.857* 
4.304* 
0.027 


1.471 
1.542 
0.037 


14.189*** 
62.366*** 
23.475*** 


Discussion 


Prediction of an event is obviously necés- 
sary before any attempt at control can be 
made. Prediction is the job of the scientist 
while control is often left to administrators, 
perhaps fortunately. The prediction of ac- 
ceptance by a professional school at the be- 
ginning (or even end) of the freshman year 
could permit rechanneling of a great deal of 
academic effort and might allow more real- 
istic planning early in the career of many stu- 
dents. The present study shows that such 
prediction is possible. At the beginning of 
the freshman year, the high-school rank and 
ACE score could be used as guides, and at the 
end of the freshman year the HPR could be 
added to the equation. These equations, and 
the variables employed, would probably vary 
somewhat from school to school and from 
year to year, but it does appear that at least 
at the University of Minnesota such predic- 
tion is possible. Similar studies should be 
done at other universities and should be re- 
peated fairly regularly, at least to the extent 
that there are changes in the student popula- 
tions, economic conditions, admission policies 
of the medical schools, etc. It is even pos- 





400 


sible that the selection of medical students 
could be done earlier than it is at present, 
perhaps sometime during the sophomore or 
junior year. 

With regard to the negative results of the 
Physician scale of the Strong, it is impor- 
tant to note that Strong had hypothesized 
that interests may correlate significantly with 
achievement when that achievement involves 
performance over a considerable period of 
time (4). The results reported here cer- 
tainly do not substantiate his hypothesis; in 
fact there was a slight inverse relationship. 
Studies of the relation of interest scores to 
achievement measures have been notably in- 
consistent (4, 5, 6, 7), such that the status of 
Strong’s hypothesis remains in doubt. Per- 
haps only under certain specific conditions 
will the hypothesis be confirmed. 


Summary 


The performance of a group of 102 male 
nonveteran premedical students who enrolled 
at the University of Minnesota in the sum- 
mer and fall of 1949 was studied over a pe- 
riod of three years, i.e., until they had an 
opportunity to apply for and be accepted by 


the Medical School. Analysis of both intel- 
lective and nonintellective factors was made 
to determine the predictability of (a) the 
first-year honor-point ratio, (0) continuation 
in the premedical curriculum, and (c) ac- 


Richard S. Melton 


ceptance by the medical school. High-school 
ranks, ACE test scores, and Cooperative Eng- 
lish test scores were all found to be useful 
predictors. Essentially negative results were 
found for all nonintellective variables, includ- 
ing the Physician scale of the Strong Voca- 
tional Interest Blank. 


Received December 14, 1954. 


References 


1. Melton, R. S. The generalized distance function: 
a classification technique for the biological and 
social sciences. U.S. Naval Sch. Aviat. Med. 
Rep., 1953, No. NM 001 057.16.04. 

. Rao, C. R. Advanced statistical methods in bio- 
metric research. New York: Wiley, 1952. 

. Smiley, D. F., & Zoleski, V. Study of applicants 
for admission to U. S. medical colleges—en- 
tering class of September, 1948. J. Ass. Amer. 
Med. Coll., 1949, 24, 339-343. 

. Strong, E. K., Jr. Vocational interests of men 
and women. Palo Alto: Stanford Univer. 
Press, 1943. 

. Strong, E. K., Jr. 
medical interests. 
Applications of 
Harper, 1952. 

. Strong, E. K., Jr., & Tucker, A. C. The use of 
vocational interest scales in planning a medi- 
cal career. Psychol. Monogr., 1952, 66, No. 
9 (Whole No. 341). 

. Super, D. E. Appraising vocational fitness. 
York: Harper, 1949. 

. Wagner, M. E., & Strabel, E. Homogeneous 
grouping as a means of improving the predic- 
tion of academic performance. J. appl. Psy- 
chol., 1935, 19, 426-446. 


Twenty year follow-up of 
In L. L. Thurstone (Ed.), 
psychology. New York: 


New 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


A Comparison of Parametric and Nonparametric Analyses 
‘of Opinion Data 


Hjalmar Rosen and R. A. Hudson Rosen 


Institute of Labor and Industrial Relations, University of Illinois 


Most researchers in the area of opinion 
have tended to use some type of continuous 
data statistics in their analyses in spite of 
. McNemar’s quite devastating critique of such 

procedure (2). Data are treated as if they 
were normally distributed, and, in those cases 
where responses to the opinion stimuli are 
fixed (the Likert scale design, in particular), 
it is assumed that equal-appearing intervals 
exist between the adjoining response cate- 
gories. 

A relatively small number of researchers 
appear to have been disturbed by McNemar’s 
critical questioning of the validity of the as- 
sumptions upon which their statistical analy- 
ses were based, however. One way in which 
they have attempted to skirt the challenges 
leveled by McNemar is merely to describe 
the response results in terms of percentage 
profiles. Another is to make use of non- 
parametric statistical techniques that are 
based on less rigorous assumptions (3). The 
latter solution, unlike percentage descrip- 
tions, allows for manipulation of the data to 
seek out interrelationships and _ significant 
differences in trends, but is, of course, less 
precise than parametric techniques. More- 
over, many researchers have grown used to 
utilizing mechanical aids such as IBM equip- 
ment, and nonparametrics are not easily dealt 
with by standard IBM machines when large 
N’s are involved. 

The writers were confronted with just such 
a problem: data from a highly structured 
opinion questionnaire, concern with Mc- 
Nemar’s critique, and an WN sufficiently large 
to make IBM analysis in terms of nonpara- 
metrics difficult. To determine to what ex- 
tent differing statistical analyses and differ- 
ing underlying assumptions would influence 
the conclusions, it was decided to analyze 
the data using both parametric and nonpara- 
metric techniques. This paper is a report on 
the results of a comparison of the two tech- 


niques on a given set of data. Such a com- 
parison, of course, is possible only where the 
two approaches are able to answer similar 
questions put to the data. 


Background 


The questionnaire data on which the com- 
parison of the two techniques was based were 
obtained in a study of union-member opin- 
ions toward a large, district union (4). The 
district union was composed of 25,000 mem- 
bers unequally divided among 21 local un- 
ions. Because the initial sampling design 
was constructed to give a reliable sample of 
each local union, questionnaires were mailed 
to some 4,000 union members. The question- 
naires were supplemented by an oral adminis- 
tration of the questionnaire to a sample of 
the nonrespondents. The district union com- 
posite used in this study, however, amounted 
to only 607 cases (4). 

Each opinion item in the questionnaire had 
a tripartite design, i.e., each aspect of union 
functioning was probed in terms of stand- 
ards, perceptions, and evaluations. The term 
“standards” referred to an individual’s con- 
ception of what is necessary or proper pro- 
cedure in a given social situation, i.e., his 
beliefs with regard to what ought to be. 
“Perceptions” referred to what an individual 
subjectively experiences as existing in a given 
social situation, whereas “evaluations” re- 
ferred to an individual’s approval or disap- 
proval of a given social situation, in terms of 
the degree to which his standards are per- 
ceived as being met. The standard and per- 
ception statements had response categories 
of: “always,” “usually,” “sometimes,” “sel- 
dom,” and “never,” with the additional cate- 
gory of “don’t know” provided for the per- 
ception statements. Evaluation statements 
had the response alternatives of: “strongly 
agree,” “agree,” “undecided,” “disagree,” and 
“strongly disagree.” 





Hjalmar Rosen and R. A. Hudson Rosen 


Standards 


Sometimes, Seldom 
and Never 


Always and 
Usually 





Always 
and 
Usually 





Some- 
times, 

Seldom 
and 

Never 


Perceptions 














Fic. 1. Fourfold table breakdown. 


It was felt that this research tool might 
provide some indication of how evaluations 
were derived: (a) that it would reveal a 
tendency for satisfaction to result from simi- 
larities between standards and perceptions 
(hereafter called homogeneity) and for dis- 
satisfaction to result from discrepancies be- 
tween standards and perceptions (hereafter 
called heterogeneity); and (6) that the di- 
rection of the discrepancies between stand- 
ards and perceptions could be determined, 
i.e., whether respondents thought more should 
be done than they saw being done (had un- 
dermet standards), or whether they thought 
less should be done than they saw being done 
(had overmet standards). 


Methods of Analysis 


The following discussion is limited to an examina- 
tion of two evaluation groups; ie., the satisfied 
group (those who answered “strongly agree” and 
“agree”) and the dissatisfied group (those who an- 


swered “disagree” and “strongly disagree”). On the 
basis of an earlier analysis (5), the undecided group 
was found to have questionable validity and conse- 
quently is omitted from the analysis presented here. 


Nonparametric Analysis 


In the nonparametric analysis, the first step was 
to determine what percentage of respondents an- 
swered in each of the two evaluation categories. 

The next step was a comparison of the satisfied 
and dissatisfied groups in terms of their standard 
characteristics and their perception characteristics. 
These characteristics of the two evaluation groups 
were derived on the basis of a dichotomization of 
the standard and the perception data so that the 
“always” and “usually” categories were combined 
and the “sometimes,” “seldom,” and “never” cate- 
gories were combined. This categorization was used 
because it fit into a fourfold table design that was 
utilized in the homogeneity analysis. Significant dif- 
ferences were computed using the Cronbach modifi- 


cation (1) of the significant difference between pro- 
portions test. 

In comparing the homogeneity characteristics of 
the two evaluation groups, a fourfold table design 
first was used, and an on-the-line analysis then was 
made as a check. 

The “A” and “D” boxes were considered to indi- 
cate homogeneity, the “B” and “C” boxes to indicate 
heterogeneity. Significant differences in homogeneity 
were derived by the same method as discussed earlier 
(1). The alternate on-the-line approach was based 
on a cross-tab breakdown utilizing all categories. 
In this approach, homogeneity was defined as abso- 
lute category comparability, e.g., “always-always,” 
“usually-usually,” etc., and heterogeneity was de- 
fined as any deviation from absolute category com- 
parability. Obviously this was a more rigorous test 
of homogeneity than that provided by the fourfold 
table analysis. Signicant differences between the two 
groups were computed (1). 

Because discrepancies between standards and _ per- 
ceptions tended to characterize the dissatisfied group, 
whereas the satisfied group was consistently homo- 
geneous (4), the nature of dissatisfied group hetero- 
geneity was explored, to give further insight into 
dissatisfaction. Although both the fourfold and on- 
the-line approaches were used in this analysis, a dif- 
ferent method for determining significant - differences 
had to be used than in the previous analysis because 
this step involved a comparison within one evalua- 
tion group rather than between groups. The for- 
mula: X — N/2/VN/4, was used in comparing the 
“B” and “C” boxes of the fourfold table (see Fig. 1) 
and the above-the-line and the below-the-line group- 
ing of the cross-tab table (see Fig. 2). “X” was 
defined as the “C” box of the fourfold table or the 
below-the-line groupings of the cross-tab table. 
“N” was equal to the total of the “B” and “C” 
boxes in one case and the above- and below-the-line 
groupings in the other. The “C” boxes and below- 
the-line categories were considered as indicatin,’ “un- 
dermet” standards; the “B” boxes and the above-the- 
line categories were considered as indicating “over- 
met” standards. 


Parametric Analysis 


As far as possible, comparable data were derived 
in the parametric analysis. Means were computed 
for the evaluation part of each item to compare to 
the nonparametric percentages. The undecided group 


Standards 


Always Usually Sometimes Seldom Never 


On-the-line breakdown. 





Parametric and Nonparametric Analyses of Opinion Data 


was omitted from the derivation of the means. The 
remaining categories were weighted from one to four 
with the “strongly agree” group having the maxi- 
mum weight of four. 

As in the nonparametric analysis, the next step 
was to compare the satisfied and dissatisfied groups 
in terms of their standards and then in terms of 
their perceptions. Mean scores of standards and 
mean scores of perceptions were derived for each 
group on each item, and tests of significant differ- 
ence were computed, using the standard formula 
for significant difference between uncorrelated means: 


‘o in ox" oie 

eee 2 y , + Ne ° 
Standard and perception categories were weighted 
from plus two through minus two with the “al- 
ways” norm or perception carrying the maximum 
positive weight. 

A comparison was then made between the satis- 
fied and dissatisfied groups in terms of relative ho- 
mogeneity, defined as the degree to which perfect 
category comparability between standards and _ per- 
ceptions existed. Mean homogeneity scores were 
derived for each group, indicating the amount of 
standard-perception category agreement. Identical 
standard-perception categories were given a weight 
of zero, and weightings of one through four were 
given to nonidentical responses, increasing with the 
degree of category discrepancy that existed. These 
means of the satisfied and dissatisfied groups were 
then subjected to tests of significant difference, using 
the standard formula for uncorrelated means: 


ox2" 


N2 ° 


‘s a oz" 
%—% Vy, 
The characteristics of the discrepancies of the dis- 
satisfied group then were investigated. Whereas, in 
deriving relative homogeneity, direction of discrep- 
ancies was purposely omitted, in determining dis- 
satisfied group characteristics directionality was in- 
cluded. To be more specific, when standards were 
more positive than perceptions, negative weightings 
were given, and when the reverse was true positive 
weightings were given. The weightings, aside from 
their directionality, were the same as those used in 
deriving relative homogeneity. To determine whether 
the directionality of any discrepancy was signifi- 
cant, the mean was divided by its standard error. 
Standards were defined as “undermet” when the di- 
rection was significantly negative, as “overmet” when 
the direction was significantly positive. 


Comparison of Results Obtained by Differing 
Techniques 


Two approaches were used in comparing the non- 
parametric and parametric results. The first simply 
was an inspection of the significant difference re- 
sults obtained by each method for standard, percep- 
tion, and homogeneity comparisons of the satisfied 


403 


and dissatisfied groups and for undermet-overmet 
standard characteristics of the dissatisfied group, and 
a noting of how many items yielded comparable sig- 
nificances (differences between the one and five per 
cent levels of confidence were ignored). The second 
step was a series of rank difference correlations 
(D=1-— (6 D*)/N(N*—1)) between the paramet- 
ric means and nonparametric percentages. An ad- 
ditional comparison was made, between the two 
nonparametric techniques, using the same methods. 
This was done to determine whether the variation 
between two techniques based upon the same statisti- 
cal assumptions would be as great as that occurring 
between techniques based on differing statistical as- 
sumptions. 


Results Obtained 


The inspectional comparison of significant 
difference data obtained by parametric and 
nonparametric analysis indicated that: (a) 
with respect to differences in standards be- 
tween the satisfied and dissatisfied groups, 
the results were equivalent in 25 out of the 
27 cases; (6) with respect to perception dif- 
ferences between the satisfied and dissatisfied 
groups, the results were equivalent in 25 of 
the 27 cases; (c) with respect to differences 
in homogeneity between the satisfied and dis- 
satisfied groups, the results were equivalent 
in 25 of the 27 cases regardless of which non- 
parametric technique was used; (d) with re- 
spect to the undermet-overmet standard char- 
acteristics of the dissatisfied group, using the 
on-the-line technique results were equivalent 
in all cases, and using the fourfold approach 
results were equivalent in all but one of the 
27 cases. 

Turning to the rank difference correlations 
(see Table 1), it was found that the relation- 
ships between parametric and nonparametric 
findings were relatively high, with the excep- 
tion of the relationships between mean and 
fourfold percentages relevant to homogeneity, 
which were moderate.* 

These correlations seem to indicate that, 
with the exception of homogeneity, the para- 
metric and nonparametric techniques would 
lead to comparable interpretations of the data. 
With respect to homogeneity, it is interest- 


1 Perhaps it should be pointed out that, although 
they are useful for a comparison of the two tech- 
niques, the standard and perception rankings per se 
have little intrinsic value without additional infor- 
mation, such as knowledge of relevant organizational 
practice and policy. 





Hjalmar Rosen and R. A. Hudson Rosen 


Table 1 


Rank Difference Correlations Between Data 
Derived by Various Techniques 








Dis- 
Satisfied satisfied 


Technique Group Group 





Norms 
Mean vs. fourfold percentages 
Perceptions 
Mean vs. fourfold percentages 
Evaluations 
Mean vs. percentages 
Homogeneity 
Mean vs. fourfold percentages 
Mean vs. on-the-line percentages 
Fourfold percentages vs. on-the- 
line percentages 
Nature of heterogeneity 
Mean vs. fourfold percentages — 
Mean vs. on-the-line percentages — 
Fourfold vs. on-the-line percentages — 


+.99 +.99 


+.85 +.98 


+.90 +.90 


+.66 
+.87 


+.65 
+.84 
+.55 +.73 
+.93 


+.96 
+.98 








ing to note that the differences between the 
results of the two nonparametric techniques 
were greater than those resulting from a com- 
parison of the mean and on-the-line analyses. 
It may also be noted that the fourfold per- 
centages differed from both the on-the-line 
percentages and the means to about the same 
degree. From this it would appear that the 
interpretations resulting from the parametric 
analysis would be as comparable to inter- 
pretations from the nonparametric treatment 
as would be the interpretations resulting from 
the two nonparametric analyses. 

The moderate correlations between four- 
fold and on-the-line homogeneity, as well as 
those discussed earlier between fourfold and 
mean homogeneity, are understandable in 


terms of the differing definition of homo- 
geneity used with the fourfold technique than 
with either of the other approaches. Both 
the on-the-line and the mean definitions of 
homogeneity were more rigorous than that of 
the fourfold method, the first measuring ho- 
mogeneity in terms of proportions of absolute 
category agreement and the latter in terms 
of degree of absolute category agreement, 
whereas the fourfold definition allowed for a 
discrepancy of one category in the “A” box 
and two in the “D” box. 


Conclusions 


To the extent that these research data are 
characteristic of attitude study data, in gen- 
eral, and the questions asked of the data can 
be dealt with by both parametric and non- 
parametric techniques, one will tend to find 
comparable results using either analysis. Al- 
though caution must be exercised relative to 
homogeneity, in general the techniques seem 
interchangeable. 


Received December 17, 1954. 


References 


. Cronbach, L. J. 
the Rorschach scores: 
Bull., 1949, 46, 396-403. 

. McNemar, Q. General review and summary: 
opinion-attitude methodology. Psychol. Bull., 
1946, 43, 293-294, 301-304. 

. Moses, L. E. Non-parametric statistics for psy- 
chological research. Psychol. Bull., 1952, 49, 
122-143. ’ 

. Rosen, H., & Rosen, R. A. H. The union mem- 
ber speaks. New York: Prentice-Hall, Inc., 
1955. 

. Rosen, H., & Rosen, R. A. H. The validity of 
“undecided” answers in questionnaire re- 
sponses. J. appl. Psychol., 1955, 39, 178-181. 


Statistical methods applied to 
a review. Psychol. 





The Journal oj Applied Psychology 
Vol. 39, No. 6, 1955 


Brand Loyalty—Twelve Years Later’ 


Lester Guest 


The Pennsylvania State University 


During the late fall of 1940 and early 
spring of 1941, 813 public school students in 
Grades 3 through the last year of high school 
living near Washington, D. C., indicated their 
awareness of 80 brand names (1). The 
names were then grouped into 16 product 
classifications, and the subjects expressed 
their preferences for one of the names in 
each product category (2). The main pur- 
pose of the research was to estimate the de- 
gree of consistency of preference as age in- 
creased. Although a longitudinal study to 
determine brand loyalty would have been 
more desirable, the dictates of the situation 
made a cross-sectional analysis more feasible. 

Since a record of consistency of prefer- 
ences of the same subjects would be more 
meaningful, during the spring of 1953 an at- 
tempt was made to contact them again, this 
time by mail. It was not possible to utilize 
other than a mail questionnaire in this fol- 
low-up, since reaching the subjects for per- 
sonal interviews would have been impossible 
in many cases, and prohibitive in time and 
money in others. 

This mail questionnaire was subject to all 
the usual disadvantages plus a few that were 
unique. All reasonable efforts were made to 
reach the original participants, but in spite 
of this, for only 462 was any kind of address 
available. Usable questionnaires were re- 
turned by 36% of the respondents (165), or 
20% of the 813 subjects who provided data 
for the original study. 

Returns of this quantity are naturally sus- 
pect of sampling biases, regardless of the fact 
that only 57% of the original subjects were 
“available” for study. Therefore, a compari- 
son of the characteristics of the original sam- 
ple with this sample is in order. Table 1 pre- 
sents percentage comparisons for sex, age, 
socioeconomic status, and IQ. 

It is readily observed that there are some 
differences between the characteristics of the 
original sample, and the present sample of 

1 Funds for this study were made available by the 


Council on Research of the Pennsylvania State Uni- 
versity. 


that sample. (Age, socioeconomic status, 
and IQ, are taken from data obtained in 
1940-41.) In the cases of socioeconomic 
status and IQ the differences are statistically 
significant tested by chi-square analysis. As 
might be expected, those returning question- 
naires this time are more heavily from higher 
intellectual levels and from childhood homes 
that were rated in the higher socioeconomic 
brackets. Obviously there would be danger 
in trying to overgeneralize from this study 
(the original sample was capable of a greater 
degree of extrapolation). In spite of this im- 
perfection of sampling and the small WN’s 
upon which to base results, it is felt that the 
data are suggestive and indicate bases for 
further exploration. 


Results 


The questionnaire proper asked the follow- 
ing questions for each of the 15 products 
queried about: 1. What is your present brand 
preference of those listed (there were five 
well-known brands listed for each product 
plus a provision for answering “none of 
these’); 2. if “none of these” is your present 
preference, if you had to choose one of the 
brands on the list, which would it be (called 
“forced choice”); 3. what brand of those 
listed is used (bought, owned) most fre- 
quently; and 4. which brands have ever been 
used (bought, owned) as an adult. 

The second part of the questionnaire asked 
each respondent to give the reason why there 
was a discrepancy between present preference 
and present use whenever such a discrepancy 
existed, and provided a rather comprehensive 
check list of potential reasons plus a space to 
add others not on the list.® 

In some cases, instructions were not fol- 
lowed exactly. In many such instances it 
was possible to interpret the respondent’s an- 
swer correctly, whereas in others it was neces- 
sary to discard certain answers. This ac- 
counts for the variable N’s found in the 
results. 


2 These data are not presented here, but may be 
obtained upon request. 


405 





Lester Guest 


Table 1 


Percentage Comparisons for Sex, Age, Socioeconomic Status, and IQ 





Females 


48% 
54% 


Age: 7 8 9 12 13 14 15 16 17 18 


1940 0+% 8% 10% 10% 9% 9% 10% 11% 138% 11% 5% 3% 
1953 1% % 10% 9% 1% 7% WM 9% 19% 12% 4% 2% 


Socioeconomic status :* A B Cc D ? 


1940-41 
1953 


1940-41 5% 
1953 8% 


38% 
45% 
90-99 100-109 110-119 120-129 130up =? 
11% 4% 


16% 8% 


44% 9% 4% 
30% 41% 4% 
IQ** 
4940-41 1% 
1953 1% 


79 down 80-89 


5% 
1% 


‘a 20% 
25% 


22% 
24% 


15% 
10% 


21% 





* Differences significant beyond the 5% level. 
** Differences significant beyond the 1% level. 


- The data permitted comparisons of 1940- 
41 preferences with: (a) preferences in 1953, 
(6) with “forced-choice” preferences in 1953, 
(c) with 1953 use, and (d) with use at any 
time as an adult. 

Table 2 presents the data for the 15 prod- 
uct categories. The difference between col- 
umns headed “total” and “— none” (read 
minus none) is that in the “total” columns, 
all five brand categories plus “none of these” 
were included in the comparisons, whereas in 
the “‘“— none” columns, those who failed to 
choose a listed brand in 1940-41 were ex- 
cluded. In making the comparisons for the 
“forced choice” situation, only those who 
made a choice of a listed brand 12 years 
previous could legitimately be compared with 
those who “had” to make a choice in 1953. 

The results shown in Table 2 indicate that 
there is little over-all difference between re- 
sults where all persons are considered and 
where those who failed to make a choice of 
a listed brand 12 years ago were eliminated 
from the computations. In all cases except 
one (razors), the “forced-choice” agreement 
data show an increase in percentages of agree- 
ment for preferences made 12 years apart. 
Of course, the subjects were not informed of 
the preferences they made in_ 1940-41. 

What constitutes a high degree of agree- 
ment must of course be partly judgmental. 
However, the data presented in Table 2 in- 
dicate a higher than chance agreement in all 


cases. On a pure chance basis, one would 
expect only about 3% agreement, and all 
these data are substantially above that figure. 
An average percentage of agreement of 32% 
for all brands of all products (and relatively 
little variation among products) seems to the 
author to be rather significant psychologi- 
cally, and to be at odds with the degree of 
loyalty suggested in the original reports (1, 
2). In those reports, cross-sectional analyses 
seemed to indicate a lack of general loyalty 
toward brand names, whereas the present data 
suggest a good deal of such loyalty when the 
same persons are examined for preferences 
after a span of 12 years. The degree of 
agreement tends to be a little larger when 
those who made no choice of a listed brand 
in 1940-41 are eliminated from the computa- 
tions. 

Consideration of the data showing the re- 
lationship between preferences expressed in 
1940-41 with present use shows less agree- 
ment than with currently expressed prefer- 
ences, but there is still a substantial degree 
of agreement on the average. For whatever 
reason, many of these subjects buy (use, own) 
the same’ brands of products now that they 
verbalized a preference for many years ago. 
When one considers all the factors that might 
prohibit agreement (price differentials, in- 
equality of distribution, and the subject not 
being the primary purchaser, to cite a few), 
there certainly seems to be some carry-over 





Brand Loyalty 


of early attitudes. Even studies analyzing 
the degree of agreement between current 
preferences and current use show far from 
perfect relationships. In fact, data obtained 
in the present study bearing on this factor 
show only 69% agreement between present 
preference and present use. 

The last two columns in Table 2 show per- 
centages of times brands preferred in 1940— 
41 were ever used, including present use. 
For example, 70% of all subjects, and 85% 
of those preferring some one of the listed 
brands of coffee in 1940-41, either now use, 
or at some time have used, the preferred 
brand as an adult. The percentages of peo- 
ple ever using the brands preferred in child- 
hood and adolescence varies from product 
to product, with typewriters, automobiles, 
watches, tires, and radios being somewhat 
lower than the rest. 

An interesting thing concerning those prod- 
ucts where the “ever used” percentages are 
relatively small, is that to a large extent the 
same percentages of persons presently use 
their earlier preferred brands as use products 
where the “ever used” percentages are much 
higher. For products where the “ever used” 
percentages are small, it would seem that 
they are the types of products that are rela- 
tively expensive, infrequently bought, and 
frequently received as gifts. For other prod- 


407 


ucts, a greater number of people sometimes 
try their preferred brand, but reject it on 
some grounds. 

It was thought that loyalty might not be 
related so much to type of product but to 
type of individual. Perhaps there is a hard 
core of persons who tend to be loyal not only 
to one brand of one product, but to one brand 
for each of a number of products. Thus it 
could be that the percentages comparing 
early preferences with present use might be 
made up of about the same people from 
product to product. An analysis was made 
of the number of agreements out of 15 pos- 
sible for each subject and a frequency dis- 
tribution constructed. The highest number 
of agreements for any subject was nine, and 
the average number of agreements was about 
four. Thus, loyalty toward these brands is 
not characteristic of individuals, but is re- 
lated to brands of products. Different peo- 
ple are loyal to brands in different product 
categories. 

It was thought that the younger subjects, 
since they might have selected their earlier 
preferences on more perfunctory grounds, 
might have less agreement with current pref- 
erences and use than their older counterparts. 
Therefore, the data were dichotomized by 
age, 7 through 12 years, and 13 through 18 
years (as of 1940-41), and an identical analy- 


Table 2 


Percentages of Agreement Between Original Preferences and: 


Present 
Preference 


Total 
35 
36 
38 
24 
33 
38 
36 
35 
27 
31 
23 
31 
34 
29 
31 


Product 
Coffee 
Typewriter 
Dept. store 
Automobile 
Gasoline 
Razor 
Magazine 
Watch 
Tooth paste 
Soap 


—none 


33 
38 
38 
23 
33 
51 
35 
35 
25 
31 
21 
30 
36 
28 
32 


Cereal 
Bread 
Tire 
Gum 
Radio 


Present 
Preference- 
Forced Choice 


41 
41 
41 
34 
38 
38 
44 
48 


37 
33 
41 
43 
36 
40 


Ever 
Used 


Present 
Use 

Total 
33 
25 
36 
20 
29 
36 
32 
26 


—none Total —none 


85 
34 
91 
40 
87 
76 
80 
37 
84 


70 
36 
89 
44 
82 


80 
47 
84 


S4 
78 
48 
82 
42 


85 
79 
45 
81 





32 32 


Average 


39 


38 


68 








408 


sis to that just described performed for these 
two groups. There were some differences be- 
tween the younger and older subjects, but on 
the whole they were small and not consist- 
ently in one direction. If anything, there is 
a suggestion of greater amounts of agreement 
for the younger group than for the older 
group. 

In addition to analyzing the data by age 
groupings, they were also analyzed by sex, 
socioeconomic status (as of 1940-41), IQ, 
and marital status. It was expected that 
women would be more likely to have higher 
degrees of agreement for products such as 
coffee, bread, and the like, and men be higher 
in automobiles, gasoline, and so on. The 
analysis did not bear this expectation out, 
and although there were some differences that 
approached statistical significance, they were 
not in the expected direction, and for most 
products the differences were small. 

The same general statement might be made 
about the results when socioeconomic group- 
ing were the basis of analysis. Most of the 
differences were small and not consistently in 
one direction. Of course, the subjects’ pres- 
ent status might be quite different from what 
it was at the time of the original study. 

In the case of IQ, two groupings were 
made, those with IQ’s up to 109, and those 
with IQ’s 110 and up. Again, most differ- 
ences were not significant, but for those prod- 
ucts where the differences approached signifi- 
cance, they were in favor of higher agree- 
ments for those with higher IQ’s. However, 
there is no patterning of products that were 
significant vs. those not significant. 

When the subjects were divided into those 
who were married vs. those who were not, 
several significant differences appeared, all 
except one in favor of higher agreements for 
the unmarried group. This would be con- 
sistent with expectation, since those unmar- 
ried would be able to select their preferred 
brands without consideration of a mate’s 
preferences. Even in this case, however, 
there seems to be no patterning in terms of 
which products’ brands get relatively larger 
degrees of agreement for one group or the 
other. 

Qualitative consideration of the five break- 
downs together gives a slight suggestion that 
differences between groups are more likely to 
be found for typewriters, gasoline, magazines, 


Lester Guest 


watches, and tooth pastes. However, all 
analyses provide results too variable and in- 
significant to be able to say anything definite 
about them. 


Conclusions 


This study presents suggestive evidence that 
there is a rather high degree of loyalty to- 
ward brand names, especially where special 
considerations such as unavailability, price 
considerations, and the respondent not being 
the primary purchaser, do not play a major 
part in brand selection. After a lapse of 12 
years, with preferences originally being ver- 
bally expressed during the ages of 7 through 
18, there is an average amount of agreement 
between early and late preferences of from 
32% to 39% depending on the kind of com- 
parison. The average degree of agreement 
drops to about 27% when early preferences 
and present use are compared. 

There is evidence that the degree of loy- 
alty is not a function of the age at which 
original preferences are stated. It would ap- 
pear that one’s mind is made up early in life, 
and although deviations will occur, they are 
not related to degree of sophistication at the 
time the preference is made. Furthermore, 
one cannot say that some persons tend to 
have a general loyalty factor and others little. 
Persons seem to have loyalty toward some 
things and not toward others. There is a 
suggestion that specific pressures decide for 
each individual where loyalties exist, but the 
data do not permit direct analysis of these. 

Although this study used brand names as 
materials, there are implications for other 
attitudinal areas. Political preferences, atti- 
tudes toward minority groups, toward na- 
tional ideologies, and toward religious issues, 
are a few other content areas in which crea- 
tion and modification of attitudes might fol- 
low the same course as for brand names. A 
long-term study with regular follow-up per- 
sonal interviews is probably necessary in or- 
der to get at such attitudinal factors for these 
complex areas. 


Received February 14, 1955. 


References 


1. Guest, L. P. The genesis of brand awareness. 
J. appl. Psychol., 1942, 26, 800-808. 

2. Guest, L. A study of brand loyalty. 
Psychol., 1944, 28, 16-27. 


J. appi. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Relation of Positive and Negative Sociometric Valuations 
to Social and Personal Adjustment 
of School Children 


Beeman N. Phillips 


Indiana State Department of Public Instruction 


and M. Vere DeVault 


The University of Texas 


Since the publication of the original work 
of Moreno (3) in 1934, many studies have 
been designed to analyze the relationship be- 
tween personality adjustment and social ac- 
ceptance. Inconsistencies in the findings re- 
ported by these studies are due in part to the 
utilization of many variations of Moreno’s 
original technique for measuring social ac- 
ceptance. Moreno’s original technique was 
that of asking children to name classmates 
near whom they would like to sit. Elaborat- 
ing upon this idea, researchers have utilized 
as measures of social acceptance the number 
of valentines received, the number of votes 
received in class elections, the number of 
best-friend choices received from classmates, 
and a variety of other techniques. Various 
combinations of these measures have been 
widely used. 

Two frequently used methods of obtaining 
social acceptance scores are of importance to 
this study. First, there are those studies 
which have utilized only the positive choice 
as a determiner of social acceptance. These 
choices are usually in response to a question 
asking the names of persons near whom or 
with whom one would like to work. Second, 
there are those studies which utilize both posi- 
tive and negative choices, studies in which 
children name those individuals near whom 
they would rather not sit, as well as those 
near whom they wish to sit. 

Frequently in these studies the social ac- 
ceptance score for an individual is the dif- 
ference between the number of acceptance 
choices and the number of rejection choices 
received from classmates. 

Some justification is found for the use of 
these particular techniques in a study by 
Jennings (2) in which 133 girls were tested 


twice during an eight-month period. She 
reported negative correlations of — .33 and 
— .50 between the number of individuals who 
rejected the subject and the number who 
chose her. These correlations indicate some 
inverse synonymity of negative and positive 
valuations. 

Classroom teachers often hesitate to use 
the negative choice because they feel that 
little additional information is gained and be- 
cause they wish to help children approach so- 
cial relationships with a positive attitude. 
They believe that asking children to name 
individuals with whom they would not wish 
to work is not a helpful approach to this 
problem. 

However, these same teachers frequently 
fail to recognize individuals who are well 
accepted by members of certain subgroups 
within the class but who are rejected by as 
many members in another subgroup. For the 
identification of these individuals, the use of 
only positive valuations of classmates is of 
limited value. 

The purpose of the present study was to 
investigate the merits of the use of either or 
both positive and negative sociometric choices 
in determining personality adjustment of 
school children. 


Method 


The study was designed to analyze the relation- 
ship between personality adjustment as measured by 
the California Test of Personality and positive and 
negative sociometric valuation variables. 

The seven classes used in the study included six 
third-grade classes in a city of five elementary 
schools. At least one class was used in each of these 
five schools. The seventh class was taken from a 
nearby rural consolidated school. There were a 
total of 250 children in these seven classes, ranging 
in size from 33 to 39. 


409 





410 


In order to obtain sociometric data, children were 
asked to name the three pupils near whom they 
would like to sit and the two near whom they would 
not like to sit. These data were then used to assign 
children to one of four patterns of sociometric valua- 
tions as follows: (a) those with many positive and 
many negative valuations, (b) those with many posi- 
tive and few negative valuations, (c) those with few 
positive and many negative valuations, and (d) those 
with few positive and few negative valuations. The 
25 most characteristic of each of these four patterns 
were included in the 2 < 2 factorial analysis of vari- 
ance used to test the effects of positive and negative 
valuations on subtest scores. Thus, data relative to 
100 children were used in the statistical treatment. 

In the remainder of this study the symbols MPV 
were used to represent those children receiving many 
positive valuations, the symbols FPV were used for 
those receiving few positive valuations, the symbols 
MNV were used for those receiving many negative 
valuations, and the symbols FNV were used for 
those receiving few negative valuations. 


Results 


The mean number of questions answered 
by children in each group in the direction of 
desirable adjustment is shown in Table 1. 

It can be seen from Table 1 that social ad- 
justment scores generally were higher than 


personal adjustment scores. It might be hy- 


Beeman N. Phillips and M. Vere DeVault 


pothesized that this increased social security 
is the result of the present emphasis on the 
development of social skills in our schools. 
Such an hypothesis, however, is not supported 
by direct evidence and must await further in- 
vestigation. 

Scores were the highest for the pupils with 
many positive valuations and few negative 
valuations, and the lowest for those with few 
positive and many negative valuations. In 
most instances, scores of the other two groups 
fell between these two extremes. 

The effect of positive and negative valua- 
tions on adjustment scores was further ana- 
lyzed by means of a 2 X 2 factorial design of 
variance. The results of these analyses are 
shown in Table 2. For 1 and 96 degrees of 
freedom an F value of 3.94 was significant at 
the 5 per cent level and a value of 6.90 was 
significant at the 1 per cent level. 

In the analysis, if an F value was signifi- 
cant at or beyond the 5 per cent level, Bart- 
lett’s test was used to determine whether there 
were significant differences in the variances 
of the groups. None of these chi squares, 
however, were significant at the 5 per cent 
level. 


Table 1 


Mean Number of Questions Each Group Answered in the Direction of Desirable Adjustment on 
Each Subtest of the California Test of Personality 








Subtest 


Group 





MPV-MNV MPV-FNV. FPV-MNV__ FPV-FNV 





Personal adjustment 


1A Self-reliance 5.6 5.8 
1B Sense of personal worth 5.8 6.8 
1C Sense of personal freedom 6.5 6.6 
1D Feeling of belonging 6.0 6.5 
1E Withdrawing tendencies 

(freedom from) 5.8 
1F Nervous symptoms 

(freedom from) 4.9 


Social adjustment 
2A Social standards 
2B Social skills 
2C Antisocial tendencies 
(freedom from) 
2D Family relations 
2E School relations 
2F Community relations 








Sociometric Valuations 


Table 2 


Analysis of Variance of Subtest Scores of Four Groups 
of Individuals with Different Combinations 
of Positive and Negative Socio- 
metric Valuations 








Sociometric 


Subtest Factor Square 





0.16 
0.16 
0.36 
1.84 


Positive 
Negative 
Interaction 
Error 


34.81 
33.64 
0.81 
2.39 


14.56** 
14.08** 


Positive 
Negative 
Interaction 
Error 


13.69 
2.89 
1.69 
2.26 


6.06* 
1.28 


Positive 
Negative 
Interaction 
Error 


22.09 
16.81 
2.89 
2.46 


8.98** 
6.83* 
1.17 


Positive 
Negative 

- Interaction 
Error 


33.64 
4.00 
9.00 
3.35 


10.04** 
1.19 
2.69 


Positive 
Negative 
Interaction 
Error 


2.56 
16.00 
3.24 
3.19 


Positive 
Negative 
Interaction 
Error 


Positive - 

Negative 0.04 
Interaction 0.36 
1.00 


Error 


3.61 
6.25 
Interaction 1.21 
1.34 


Positive 
Negative 


Error 
4.00 
4.84 


0.64 
2.59 


Positive 
Negative 
Interaction 
Error 


Positive 0.16 
Negative 
Interaction 
Error 


4.00 
1.63 





* Significant at the 5% level. 
** Significant at the 1% level. 


Table 2—Continued 


Sociometric 
Factor 


Mean 


Subtest Square 


1.96 
4.00 
0.36 
2.07 


Positive 
Negative 
Interaction 
Error 


1.00 
9.00 
1.44 


Positive 

Negative 
Interaction 1 
Error 


In the personal adjustment portion of the 
test, either the positive or the negative factor 
was significantly related to every subsection 
except the subsection on self-reliance. In the 
social adjustment portion of the test either 
the positive or negative factor was signifi- 
cantly related only to the subsections con- 
cerned with social skills and community re- 
lations. Thus, the number of positive or nega- 
tive sociometric valuations received was not 
related significantly to personality adjustment 
in terms of social standards, antisocial tend- 
encies, family relations, or school relations. 

In that part of the test concerned with per- 
sonal adjustment, children with many positive 
valuations as compared with those with few 
or no positive valuations had better adjust- 
ment in terms of their sense of personal 
worth, sense of personal freedom, feeling of 
belonging, and freedom from withdrawing 
tendencies. Significant differences were not 
revealed relative to self-reliance and freedom 
from nervous symptoms. In that part of the 
test concerned with social adjustment, no sig- 
nificant differences were found between those 
pupils with many positive valuations as com- 
pared with those with few or no positive 
valuations. 

Children with few or no negative valua- 
tions as compared with those with many nega- 
tive valuations had better adjustment in terms 
of their sense of personal worth, feeling of be- 
longing, and freedom from nervous symptoms. 
In two of the subsections concerned with so- 
cial adjustment—social skills and commu- 
nity relations—there were similar differences. 

In no instance was there a significant inter- 
action of the positive and the negative factor. 





412 


Thus, significant relationships were due only 
to possession of, or lack of, either positive or 
negative valuations. No additional signifi- 
cant relationships were revealed as a result 
of either positive or negative valuations. 


Conclusions 


Seven of the 12 subsections of the Cali- 
fornia Test of Personality produced evidence 
to indicate a relationship between one’s social 
position among his peers and some aspect of 
personality adjustment as measured by this 
test. 

On two subsections of the test (sense of 
personal worth and feeling of belonging) both 
positive and negative valuation differences re- 
vealed significant relationships. On two other 
subsections (sense of personal freedom and 
freedom from withdrawing tendencies) signifi- 
cant differences were found only when con- 
sidering the positive valuation variable. On 
three additional subsections (freedom from 
nervous symptoms, social skills, and com- 
munity relations) significant differences were 


Beeman N. Phillips and M. Vere DeVault 


found only when considering the negative 
valuation variable. 

These relationships suggest that sociomet- 
ric techniques can be of help to teachers who 
wish to understand the personality adjust- 
ment of their pupils. They suggest that both 
positive and negative valuations have distinct 
contributions to make in the understanding of 
pupil adjustment. These findings also sug- 
gest a warning for those of us who would de- 
rive a single score from positive and negative 
valuations. Results indicate rather that data 


concerning positive and negative valuations 
contain information of two kinds and thus 
should be analyzed independently. 


Received January 10, 1955. 


References 


1. Edwards, A. L. Statistical analysis for students 
in psychology and education. New York: 
Rinehart, 1950. 

2. Jennings, Helen Hall. Leadership and isolation. 
New York: Longmans, Green, 1943. 

3. Moreno, J. L. Who shall survive? New York: 
Beacon House, 1934. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Is Interest Maturity Related to Linguistic Development? 


Maurice D. Woolf 
Kansas State College 


and Jeanne A. Woolf 


Manhattan, Kansas 


It is possible that a student whose linguis- 
tic ability is not equal to his quantitative 
ability will tend to have certain distinguish- 
ing traits, interests, or personal character- 
istics. Knowledge of these characteristics 
might contribute to the effectiveness of 
remedial and counseling services for such 
students and to a thorough understanding of 
their needs. Two groups of students, equated 
for quantitative ability but differing in lin- 
guistic ability, were studied to determine 
whether or not they differed in other respects. 


Procedures 


From an enrollment of 1,231 college freshmen in 
a scientific-technical college in the year 1949, 119 
were found to rank 20 or more percentile points 
higher in quantitative ability than in linguistic abil- 
ity, as measured by the American Council on Edu- 
cation Psychological Examination prior to enroll- 
ment. This group will be called the A group. The 
mean difference between “Q” and “L” for students 
in this group was 44 percentile points in favor of 
—o” 

In the B group of 110 students from the same 
class, the “Q” rank of each was approximately equal 
to the “L” rank. The greatest difference between 
“Q” and “L” for any one student in this group was 
four percentile points, and the mean difference was 
two percentile points. 

Further differences in linguistic development were 
established by comparing the raw scores of the two 


groups on the Cooperative English Test, Form OM, 
and the Cooperative Reading Test C-2, as reported 
in Table 1. 

Percentile ranks for mean raw scores of the A 
group in these achievement tests, by local norms, 
were 49, 50, and 45, respectively, or relatively low 
as compared with the percentile rank of 81 in “Q.” 
Percentile ranks for students in the B group in Eng- 
lish and reading achievement were 74, 74, 73, or 
fairly consistent with rank in quantitative ability. 
Differences between the groups in English total and 
reading speed and comprehension are reliable be- 
yond the 1% level of confidence. 

The scores of these students were available for six 
group scales of the Strong Vocational Interest Blank 
for men and for nonoccupational scales: occupa- 
tional level, masculinity-femininity, and interest ma- 
turity. A t¢ score was calculated from the mean raw 
scares of Group A as compared with those of Group 
B in each of the nine SVIB scales. 


Results 


Significant ditferences between the 
groups are reported in Table 2. 

Group B, with superior linguistic ability 
and superior verbal skills, proved to rank 
higher than Group A in interest maturity, 
and the difference is statistically significant 
at the 1% level of confidence. Group A 
ranks at the 28th percentile and Group B at 
the 44th percentile in IM as compared with 
18-year-old men in general (2, p. 263). 


two 


Table 1 


Tested Linguistic Skills of Groups A and B Compared by Means of t Scores 


No. Skill 


118, 
110 


Group 
English total 
178.06 


118 
110 


16.81 
24.79 


Reading speed 


118 
110 


12.26 
17.29 


Comprehension 


1.1716 





* Significant beyond the 1% level of confidence. 


413 





Maurice D. Woolf and Jeanne A. Woolf 


Table 2 


Tested Interest Maturity and Social-Welfare Interests of Groups A and B Compared by Means of ¢ Scores 








SVIB Scale 


Group No. 
A 119 Social welfare 
B 110 — 77 


Mean RS 





A 116 
B 110 


— 8.16 
38.55 


Interest maturity 





* Significant at 1% level of confidence. 


Neither shows a pronounced interest in the 
work of the social science teacher, minister, 
YMCA worker, or city school superintendent. 
However, the interests of Group B are more 
nearly like those of men in these social wel- 
fare occupations than are the interests of 
Group A by a statistically reliable margin. 
Scores of Groups A and B on five addi- 
tional occupational group scales and two non- 
occupational scales failed to differentiate sig- 
nificantly between the two groups. Data 
from these scales are given in Table 3. 


Discussion 


The total number of test items liked by 
men tends to increase and the total number 
disliked to decrease between the ages of 15 
and 25 years. A decrease in dislikes relat- 
ing to occupations and school subjects is also 


noticeable during this period (2, pp. 286, 


Group 


Rating Sd t P 


~ =15.62 (B-) 


(B) 14.85 5.1685 2.87 O1* 


28 


44 406.71 14.1390 3.04 O1* 


287). Presumably, these changes are reflected 
in interest maturity scores. Two-thirds of 
the changes in interest occur before the age 
of 18.5 (2, p. 270). The low mean IM score 
suggests that these customary changes in in- 
terests have not occurred among members of 
the linguistically inferior group. The mean 
IM score of Group B, near average for 18- 
year-old men in general, suggests that stu- 
dents with “L” equal to “Q” have increased 
their total number of likes and decreased 
their total number of dislikes, according to 
the usual pattern of development dmong men 
in general between the ages of 15 and 18.5. 

This difference is interesting in view of the 
fact that the IM scale correlates with intelli- 
gence at — .16 and negatively with both sci- 
entific and linguistic interests (2, pp. 264, 
Apparently IM reflects a balance be- 
linguistic and quantitative abilities 


285). 
tween 


Table 3 


Comparisons of Raw Scores of Groups A and B on Seven SVIB Scales 


é ‘SVIB Scale 
Art-medical 
Engineering-chemistr\ 
Business-detail 

Sales 

Writing-law 


Occupational level 


Masculinity-femininity 


Mean RS 


— 19.68 
— 6.97 


6.29 
2.95 


5.91 


2.35 


2.20 
— 4.39 


—45.69 
— 48.43 


11.42 
15.90 


44.94 


34.30 


Rating 
(B—) 


(B) 4.1515 


(B-—) 


(B- 5.0802 


(B) 
(B) 2.3765 


(B) 
(B) 3.8384 


(B—) 
(B—) 


5.5932 
6.1743 


10.64 14.1908 





Is Interest Maturity Related to Linguistic Development? 


rather than general intelligence as 
sented by a total score. 

The low IM scores of students in Group A 
suggest that they are more interested in such 
activities as movies, picnics, and feats of 
physical skill and daring, and less interested 
in academic and professional activities than 
are 18-year-old men in general. Among the 
14 occupations for which 15-year-old boys 
have a positive attitude are aviator, ex- 
plorer, inventor, secret service man, athletic 
director, ship’s officer, auto racer, and auto 
repair man (2, pp. 294-295). Scientific in- 
terests tend to decline between the ages of 
15 and 25. The interests of Group A are 
more like those of 15-year-old boys than are 
the interests of Group B, and less like those 
of 18-year-old men in general than are the 
interests of Group B. 

If the IM score is interpreted in terms of 
stability of interests (2, p. 281), these find- 
ings may shed some light on a previous study 
of mortality among students enrolled in en- 
gineering (3, pp. 233-234). Students with 
low “L” scores tended to be eliminated from 
architecture, and mechanical, architectural, 
and civil engineering curricula before the 
senior year of college. Students who re- 
mained in these curricula for three or more 
years tended to produce “L” scores virtuaily 
equal to “Q” scores. Interest maturity as it 
relates to linguistic development may have 
affected mortality. The interest patterns of 
students in Group A, particularly if scientific 
in nature, may be subject to change. 

Differences between Groups A and B in 
“social intelligence’ are suggested by Dar- 
ley’s definition, ‘Interest-maturity, redefined 
as a phase of personality, might characterize 
the well-organized, socially sensitive, gener- 
ally mature, tolerant, insightful individual” 
(1, p. 60). 

Students with primary interest patterns in 
social-welfare occupations tend to rank sig- 
nificantly higher in IM than do students with 
primary interests in four other occupational 
areas and those with no primary interest pat- 
terns, according to Darley (1, p. 61). Scores 
of students in social adjustment, social pref- 
erence and social behavior favor the welfare 
group over five other occupational groups by 
statistically reliable margins (1, p. 64). The 
low scores of Group A in social welfare inter- 


repre- 


415 


est and IM suggest a relationship between 
linguistic development and social develop- 
ment. 

The hypothesis that occupational interest 
types grow out of the development of the in- 
dividual’s personality and self concept is cau- 
tiously suggested by Darley (1, pp. 57, 65). 
Might a low score in interest maturity be 
suggestive of delay in the development of a 
differentiated self concept? 


Summary and Questions 


SVIB scores of two groups of college fresh- 
men, equated for quantitative ability but dif- 
fering in linguistic ability and skills were 
compared. The B group with “L” scores 
equal to “Q” scores ranked significantly 
higher in interest maturity. The tested in- 
terests of the linguistically inferior A group 
were less similar to those of men in social 
welfare occupations than were the tested in- 
terests of the B group. Linguistic develop- 
ment appears related to social development 
and general maturity. 

Strong and others present evidence that 
IM is not a function of general intelligence 
and that it is unrelated to either linguistic or 
scientific interests. In view of findings re- 
ported here, IM appears related to a balance 
in development between verbal and quantita- 
tive abilities. 

Some questions can be raised: “Would a 
group of students with high ‘L’ and low ‘Q’ 
tend to yield appreciably lower interest-ma- 
turity scores than a group whose ‘L’ and ‘Q’ 
are nearly equivalent? Or would students 
with high ‘L’ and low ‘Q’ tend to make 
equally high scores in IM as those with even 
development in ‘L’ and ‘Q’? Would students 
with high ‘L’ and low ‘Q’ make appreciably 
higher IM scores than students with high ‘Q’ 
and low ‘L’?” 


Received January 14, 1955. 


References 


. Darley, J. G. Clinical aspects and interpretation 
of the Strong Vocational Interest Blank. New 
York: Psychological Corporation, 1941. 

. Strong, E. K. Vocational interests of men and 
women. Stanford: Stanford Univer. Press, 
1943. 

3. Woolf, M. D., & Woolf, Jeanne A. The student 
personnel program. New York: McGraw- 
Hill, 1953. 





The Journal of Applied Psycholog 
Vol. 39, No. 6, 1955 


Technique of Problem Solving as a Predictor of Achieve- 
ment in a Mechanics Course ' 


Eugene L. Gaier 


Louisiana State University 


In a previous paper, Cross and Gaier (2) 
reported the development of an instrument— 
the Balance Problems Test (BPT)—designed 
to assess individual differences in techniques 
of problem solving. This test affords indices 
of predisposition for the use of facts versus 
the use of principles in problem solving, as 
well as a measure of problem-solving ability. 
Two of the findings are of major import here: 
(a) the kind of information selected and uti- 
lized by subjects was significantly related to 
their success in problem solving, and (6) the 
BPT, reflecting degree of preference of prin- 
ciples versus facts in problem solving, was in- 
dicated to be a more effective predictor of 
educational achievement (viz., high school 
mathematics grades) than tests of general 
aptitude. 

The purpose of the present study is to in- 
vestigate the BPT as a predictor of educa- 
tional achievement in a military situation. 
Specifically, this study was initiated to in- 
vestigate how good the BPT is as a predictor 
of achievement defined by final grades in the 
Airplane and Engine (A. & E.) Mechanic, 
General course, and tests of mechanical job 
knowledge. The following hypotheses were 
proposed: 


1. Proficiency of airmen in solving the con- 
stituent problems of the BPT is positively re- 
lated to the degree of preference for prin- 
ciples over facts. 

2. Aircraft and engine mechanical profi- 
ciency measured by grades in A. & E. and by 
two job-knowledge tests is positively related 
to the preference for principles and number 
of BPT problems solved. 


1 This study was supported in part by the United 
States Air Force under Contract AF 33(038)—25726, 
monitored by the Commanding Officer, Human Re- 
sources Research Center, Attention: Director of Op- 
erations, Lackland Air Force Base, San Antonio, 
Texas. Permission is granted for reproduction, pub- 
lication, use, and disposal in whole or in part by 
or for the United States Government. 


Tests, Subjects, and Procedure 


Tests. The Balance Problems Test was adminis- 
tered to 216 A. & E. mechanics at three Air Force 
bases, together with two forms of a job-knowledge 
test (JKT-A and JKT-B) assembled by the Train- 
ing Research Laboratory of the University of Illi- 
nois. These 66-item job-knowledge tests included 
items previously found by the author to be related 
to amounts of Air Force mechanical training and 
job experience. Standard scores on the tests of the 
Airman Classification Battery (ACB), as well as 
aptitude indices (derived from a composite of test 
scores), were available subjects. Of the 216 Air 
Force mechanics employed as subjects, 176 men had 
basic mechanical training and experience. The re- 
maining 40 men had completed an additional ad- 
vanced mechanical course, plus having six months or 
more of line experience. They ranged in age from 
17 to 34 years, the mean age being 19.7. In formal 
education, the sample ranged from seventh grade to 
the completion of college, with a mean of 11.2 years 
of schooling. 

Scores on the BPT. The following four scores de- 
rived from the-BPT, referred to as Variables 1, 2, 
3, and 4, were employed to test our hypotheses: 

Variable 1—number correct: a count of the num- 
ber of subproblems answered correctly (range 0 to 
24) irrespective of the kind of information used by 
the subject in his solution. This index was taken as 
a measure of competency or problem-solving profi- 
ciency. 

Variable 2—number of facts uncovered: frequency 
count of the number of facts uncovered (range 0 to 
24). No attempt was made to evaluate the effec- 
tiveness of the subject’s use of the information un- 
covered. This count was taken as a measure of de- 
pendency on factual information in problem solving. 

Variable 3—number of principles uncovered: fre- 
quency count of the number of principles uncovered 
with a range from zero to six employed as a measure 
of relative dependency on generalized information. 
As in Variable 2, no attempt was made to evaluate 
the subject’s effective application of the information 
uncovered. 

Variable 4—weighted test score (S= C+ 24—F): 
the number correct plus 24 minus the number of 
facts uncovered. A constant of 24 was inserted to 
eliminate negative scores in those cases where the 
number of facts uncovered exceeded the number of 
problems correctly solved.2 This score, previously 


2 The BPT would probably be improved if it were 
so adapted in difficulty that the number of facts un- 
covered never exceeded the number correct. 


416 





Problem Solving as Predictor of Achievement 


Table 1 


Product-Moment Correlation Coefficients for Scores on 
the BPT and Mechanical Achievement Indices 


BPT Scores 


Mechanical 1 2 3 4 


Achievement No. No. No. C+24 
Correct Facts Principles —F 


Variables 


JKT—Form A 30 oa ye 39 
JKT—Form B 35 : 1 34 
Final A. & E School 

Grade 44 . .2 A5 


* Significant at the 5% level, but less than the 1% level. All 
other values in the table are significant at beyond the 1% 
confidence level. 


shown (2) to be an excellent predictor of achieve- 
ment in high school mathematics, was developed as 
a joint measure of the number of facts uncovered 
and the subject’s problem-solving proficiency on the 
test. The fewer facts a subject uncovered, the more 
dependent was he on principles. 


Results 


Hypothesis I. Variables 1, 2, and 3 were 
employed to test Hypothesis I. Number of 
problems correctly solved has a correlation of 
— .05 with number of facts uncovered and of 
+ .17 with number of principles used. These 
results represent only a very slight indication 
that the hypothesis is tenable. 

Hypothesis II. The correlations of the 
mechanical achievement variables with the 
four BPT scores are presented in Table 1. 
All but one of the 12 correlations are signifi- 
cant at the 1% level or beyond. The other 
value (r = — .14), between Variable 2 and 
Form B of the JKT, however, is significant 
at beyond the 5% level. These results serve 
to support the second hypothesis, viz., that 
the indices of mechanical achievement are 
positively related both to the number of 
problems solved and the tendency to prefer 
principles. The correlations of the three 
mechanical achievement indices with BPT 
weighted score are as high, or higher, than 
the analogous correlation for Variables 1, 2, 
or 3. While the values are not significantly 
different from each other, they do provide 
support for the view that the weighted score 
is at least as good a predictor as any one of 
the other three BPT indices. 


417 


The negative direction of the correlation 
obtained between BPT Score 2 and the other 
variables is both in line with expectations 
and logically consistent with the positive cor- 
relation existing between the weighted score 
and final school grade (r = .45, significant 
at beyond the .01 level). The results from 
investigating the second hypothesis indicate 
that airmen who do well in the A. & E. 
course tend to employ more principles and 
fewer facts on the BPT and tend to solve 
more of the test problems. 

Relationships of BPT scores to aptitude. 
Of the 24 correlation coefficients in Table 2, 
four are not significant. In but one case the 
weighted score correlation with the ACB 
measures was either equal to or greater than 
the analogous correlations for Variables 1, 2, 
or 3. This result indicates that the weighted 
score is as good an index of aptitude as any 
of the other three variables of the BPT. 

The prediction of school grades. To ex- 
amine the extent to which BPT and selected 
ACB scores have differential predictive power, 
four BPT scores and six ACB measures were 
employed as predictors. The criteria em- 
ployed were grade in the course and the two 
scores on the JKT, Forms A and B. 

The two highest correlations of grade with 
BPT predictors were: r = .44 with number 
correct, and r= .45 with weighted score. 
The two highest analogous correlations of 
grades with the selected ACB measures were: 
r = 44 with Mechanical Index—a weighted 
composite, and r = .46 with Arithmetic Rea- 


Table 2 


Product-Moment Correlation Coefficients for Scores on 
the BPT and Six ACB Aptitude Index Scores 





BPT Scores 
ACB Variables 1 2 3 4 


—" 
—" 


— .19** 
Ag™.. — 19" 
srr le COAT 
Aviation information .17** —.25** .15** 
Mechanical principles .21** —.09 03 

General mechanics as 


Mechanical index ma 
Mechanical key 


Arithmetic reasoning 


36°" 
ma 
.46** 
ao 
a 
—.01 03 it 





* Significant at the 5% level, but less than the 1% level. 
** Significant beyond the 1% level 





418 


soning. Thus, the two classes of predictors 
agree closely in their two highest correlations 
with grades. 1 

Jointly taken, the weighted score and the 
Arithmetical Reasoning score yielded a multi- 
ple correlation of .53 for predicting course 
grades. The combined use of these two pre- 
dictors in forecasting success on the job- 
knowledge tests resulted in slightly lower 
correlation values: R = .42 and .44 for JKT- 
A and JKT-B, respectively. They thus ac- 
counted for approximately one-fifth of the 
variance in predicting either the final grade 
in the course or job-knowledge test scores. 

When the weighted score was employed 
with five of the six ACB measures to predict 
the criterion, the multiple R was consider- 
ably increased to .62 for grade in the A. & E. 
course. For JKT-A and JKT-B, the multi- 
ple-correlation values were .501 and .498, re- 
spectively. In two of these three instances 
(viz., school grades and JKT-A), the weighted 
BPT score yielded the highest beta weight, 
.122 and .268, respectively. For JKT-B, the 
Aviation Information score gave the largest 
beta weight (.204), and the weighted BPT 
score the second largest beta weight (.164). 
In summary, it appears that the weighted 
BPT score has some unique predictive value 
with respect to mechanical school achieve- 
ment as compared with the five ACB scores 
studied here. 


Summary and Conclusions 


The present study was designed to investi- 
gate the.value of the use of a test giving a 
choice of problem-solving techniques as a pre- 


® The Mechanical Index was omitted since it is a 
composite of the ACB scores. 


Eugene L. Gaier 


dictor of mechanical achievement as assessed 
by final grade in the Airplane and Engines 
Mechanics course, and by scores on two 
mechanical job-knowledge tests. A recently 
developed instrument—the Balance Problems 
Test (BPT)—was utilized to obtain an index 
of relative preference for the two problem- 
solving techniques. 

By means of a correlational analysis, the 
following conclusions are indicated: 


1. Final grade in the A. & E. course is 
positively related to the number of problems 
correctly solved in the BPT and to the tend- 
ency to prefer principles over facts in their 
solution. 

2. The BPT, as measured by the weighted 
score, is as good or a better predictor of final 
mechanical school grade than any one of five 
standard scores here studied. When com- 
bined with the other predictors to determine 
the multiple correlation, the BPT weighted 
score provided the highest beta weight for 
predicting mechanical school grade and job 
knowledge as measured by one of the two 
tests. 


Received January 17, 1955. 


References 


. Bloom, B. S., & Broder, Lois J. Problem-solving 
processes of college students. Suppl. educ. 
Monogr., Univer. of Chicago Press, 1950, No. 
73. 

. Cross, K. Patricia, & Gaier, E. L. Technique of 
problem solving as a predictor of educational 
achievement. J. educ. Psychol., in press. 

. Gaier, E. L. Selected personality variables and 
the learning process. Psychol. Monogr., 1952, 
66, No. 17 (Whole No. 349). 

. Gaier, E. L. The role of knowledge in problem- 
solving. Progressive Education, 1953, 30, 138- 
141. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


A National Answer to the Question, “Do Sons Follow Their 
Fathers’ Occupations?” ' 


Paul G. Jenson * 


Macalester College 


and Wayne K. Kirchner 


Fulbright Scholar, Cambridge, England 


Occupational mobility, social class status, 
and related areas have been the subject of 
many investigations by psychologists, sociolo- 
gists, social workers, economists, and other 
social scientists. One of the questions that 
has aroused considerable interest is the re- 
lationship between fathers’ occupations and 
sons’ occupations. The absence or presence 
of such a relationship would be evidence for a 


social class theory, for vertical occupational. 


mobility, and for parental influence in the 
vocational choice-making process. This study 
shows the relationship between fathers’ and 
sons’ occupations in a representative sample 
of the urban population in the United States. 

Past studies have been contradictory in 
their findings regarding parental influence in 
the selection of a vocation. For example, 
Pinney (9) found that fathers’ occupation 
was of little importance in the selection of a 
vocation by high school students. Likewise, 
Kroger and Louttit (4) in a study of high 
school boys also found that the first voca- 
tional choice made by these boys had very 
little relationship to the fathers’ occupation. 

On the other hand, Peters (8) found that 
the most influential factor in the vocational 


' The material in this paper is a small part of the 
data accumulated in a study of occupational mo- 
bility conducted by cooperating university research 
centers and the Committee on Labor Market Re- 


search of the Social Science Research Council for 
the U. S. Air Force and the U. S. Bureau of the 
Census. The Industrial Relations Center of the 
University of Minnesota was one of the cooperating 
agencies. The authors are indebted to Dr. Dale 
Yoder, Dr. Herbert G. Heneman, Jr., the Director 
and Assistant Director of the Industrial Relations 
Center, and to Mr. Harland Fox who was the IRC 
staff member directly in charge of the survey in the 
St. Paul area. Mr. Fox is now with the National 
Industrial Conference Board. 

* At the time this paper was prepared the authors 
were Research Fellows with the Industrial Relations 
Center, University of Minnesota. 


choice-making process of high school students 
was the parent. Similarly, Dyer (3) found 
a high degree of permanence in the vocational 
choice when the family was involved in the 
choice. Nelson (6) found that college stu- 
dents selected their fathers’ occupations more 
than chance would allow. 

In studies which bear more directly on the 
question posed in the title, the available evi- 
dence indicates that sons do follow their 
fathers’ general occupational group. Both 
Davidson and Anderson (2) in San Diego 
and Bendix e¢ a/. (1) in Oakland found that 
sons tended to enter and remain in occupa- 
tions similar to those of their fathers. Rogoff 
(10), in her recent book, found similar re- 
sults in Marion County, Indiana. (This 
county includes Indianapolis and the sur- 
rounding area.) Because these studies were 
limited to specific geographical areas, their 
application to the whole population is ques- 
tionable. This study, by sampling a wider 
geographical area, attempts to answer the 
question, “Do sons follow their fathers’ oc- 
cupations?” on a nationwide basis. 


Method 


Basic data for this study were collected in 1951 in 
six major cities (Los Angeles, San Francisco, Chi- 
cago, Philadelphia, New Haven, and St. Paul) as a 
part of a major analysis of labor mobility in the 
United States (7). 

Information was obtained by enumerators from 
the Bureau of Census in interviews with over 8,000 
heads of households. Households were selected sys- 
tematically from all households reported in these six 
cities in the 1950 Census of Population and Housing. 
In addition, household units constructed after the 
1950 census and units in large quasi-households were 
also sampled proportionately. 

Much information concerning previous employ- 
ment and occupational mobility was obtained from 
each respondent. This particular report deals with 
just two parts of the data: father’s regular occupa- 


419 





Paul G. Jenson and Wayne K. Kirchner 


Table 1 | | [es 


Number and Percentage of Fathers and Sons in the 
Different Occupational Groups 








Fathers 


Occupational Num- 
Group ber 








Professional, technical 
and kindred workers 481 
Farmers and farm 
managers 1,336 
Managers, officials, and 
proprietors, except 
farm 1,476 
Clerical and kindred 
workers 
Sales workers i 494 
Craftsmen, foremen, and 
kindred workers ,04: ; 1,788 
Operatives and kindred 
workers : 1,970 


Sons’ Occupation (in Code) 


Service and private 
household workers 664 
Farm laborers and 
foremen 104 51 
Laborers, except farm 
and mine 574 542 
Total 8,292 8,292 





tion and son’s regular occupation. All of the people 
in the sample were over 25 years of age and had 
worked full time with pay at least one month or 
more in 1950. The total sample is representative of 
urban males over the age of 25 in the six cities. 


Results and Discussion 


Number and Percentage of Sons in the Same Occupation as Their Father 


Table 1 shows the frequency of fathers and 
the frequency of sons in the various occupa- 
tional groups. As expected, there have been 
changes in the frequencies within the differ- 
ent groups. Increased numbers of men have 
entered the professional, clerical, sales, opera- 
tive, and service occupational groups. De- 
creases have occurred in the managerial and 
craftsmen occupations as well as with the 
farmers and farm managers group. The large 
change in that group is partly due to the 
actual decrease in the number of farmers over 
the years as well as to the fact that the sam- 
ples in this study were from metropolitan 
rather than rural areas. 


Fathers’ Occupation 


lerical, etc. 
Sales workers 














Farmers, farm managers 
Managers, officials, proprietors 
Service and household 

Farm laborer, foreman 
Laborer, except farm and mine 


Professional, technical, etc. 
Craftsmen, foremen, etc. 
Operatives, etc. 


C 





“Do Sons Follow Their Fathers’ Occupations?” 


Table 3 


Relationship of Sons’ Occupation and Fathers’ 
Occupation in Terms of Nonmanual 
and Manual Occupations 








Sons’ Occupation 
Fathers’ SEES 
Occupation Nonmanual Manual Total 





918 
4,150 


1,573 
1,651 


Nonmanual 
Manual 


2,491 
5,801 


Total 3,224 


5,068 8,292 





Table 2 shows the relationship between the 
fathers’ regular occupation and the sons’ 
regular occupation. Analyses of the table re- 
veal the following facts. First, in five of the 
ten occupational classifications more sons fol- 
low their fathers’ occupation than any other 
occupation. For example 31 per cent of sons 
of professional fathers also entered profes- 
sional occupations; 27 per cent of sons of 
fathers who were proprietors and managers 
are in similar occupations; 24 per cent of 
sons of fathers in clerical work are likewise 
employed; 31 per cent of sons of craftsmen 
are also craftsmen; 33 per cent of sons of 
fathers engaged in operative work follow this 
type of work. 

There is also a tendency in other occupa- 
tional classifications for sons to enter occupa- 
tional fields other than their fathers’ occupa- 
tion. In general, these involve a step up the 
occupational ladder. This is best illustrated 
by the 25 per cent of sons of fathers engaged 
in household and service work who entered 
operative fields, and by the 34 per cent and 
22 per cent of sons of unskilled fathers who 
entered operative work and craft work, re- 
spectively. 

Somewhat more revealing than these trends 
are the relationships shown in Table 3. Con- 
sideration is given here to manual and non- 
manual occupations and the relationship be- 
tween fathers and sons engaged in those oc- 
cupations. As Table 3 shows, sons follow 
their fathers’ general type of work. The 
large percentage of sons of fathers engaged 
in manual occupations are likewise engaged 
in manual occupations. Over 71 per cent of 


421 


such sons enter manual occupations. Simi- 
larly, 63 per cent of sons of fathers engaged 
in nonmanual occupations also enter non- 
manual occupations. The significance of 
these results was shown by a chi square of 
802 (corrected for continuity) which is 
highly significant statistically. 


Conclusion 


The over-all evidence then, from this study, 
reveals that sons do tend to follow their 
fathers’ general type of occupations. When 
sons do not, they generally tend to make a 
jump up the occupational ladder. It would 
seem that mobility is greater from the bot- 
tom of the ladder toward the top. It would 
also seem that a father’s occupation does have 
an important influence upon what a son will 
do in his future work. 


Received January 18, 1955. 


References 


1. Bendix, R., Lipset, S. M., & Malm, F. T. Social 
origins and occupational career patterns. IJn- 
dustr. & labor relat. Rev., 1954, 7, 246-261. 

. Davidson, P. E., & Anderson, H. D. Occupa- 
tional mobility in an American community. 
Stanford University: Stanford Univer. Press, 
1937. 

3. Dyer, Dorothy T. The relation between voca- 
tional interests of men in college and their 
subsequent occupational histories for ten years. 
J. appl. Psychol., 1939, 23, 280-288. 

. Kroger, R. M., & Louttit, C. M. Influence of 
father’s occupation on the vocational choices 
of high school boys. J. appl. Psychol., 1935, 
19, 203-212. 

. Lipset, S. M., & Bendix, R. Social mobility and 
occupational career patterns. Amer. J. Sociol., 
1952, 57, 366-374. 

. Nelson, E. Fathers’ occupations and students’ 
vocational choices. Sch. & Soc., 1939, 50, 
572-576. 

. Palmer, Gladys L., & Brainerd, Carol P. Labor 
mobility in six cities. New York: Social Sci- 
ence Research Council, 1954. 

8. Peters, E. F. Factors which contribute to 
youth’s vocational choice. J. appl. Psychol., 
1941, 25, 428-430. 

. Pinney, M. The influence of home and school 
in the choice of a vocation. J. educ. Res., 
1932, 25, 286-290. 

. Rogoff, Natalie. Recent trends in occupational 
mobility. Glencoe, Ill.: Free Press, 1953. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Population Stereotypes in Pedal Control of a “Ball-Bank” 
Indicator 


B. R. Bugelski 


University of Buffalo 


Chapanis (1), Fitts (2), and Warrick (3) 
have described ‘“‘population stereotypes” for 
a variety of controls and control-display re- 
lationships. Their work suggests that some 
forms of movement are more “natural” in cer- 
tain situations than are others, and that in- 
strument and control-device designers should 
consider such natural tendencies in the de- 
velopment of instrument displays and corre- 
sponding controls. Thus far, no related work 
has been reported for instrument displays 
where the indicator is controlled by foot mo- 
tion, probably because of the infrequent use 
of pedal controls that involve any instru- 
mental display. 

There is one indicator in all aircraft, how- 
ever, that does functien primarily in terms of 
foot control; this is the so-called “ball-bank” 
indicator which shows the relative degree of 
coordination of stick and rudder in turns. 
This indicator consists of a gravity-controlled 
ball which moves to the right or left in a 
curved tube as a plane banks. The move- 
ments of the ball are dampened by a fluid. 
To achieve balanced or coordinated turns the 
pilot tries to keep the ball centered by push- 
ing rudder pedals appropriately. Pushing the 
right pedal results in a ball movement to the 
left, and vice versa. Beginning students in 
flying frequently push the wrong pedal. The 
frequency of this error suggests that there 
might be a “natural” tendency to do so. 

The tube of the standard instrument curves 
upward at both ends. For another purpose 
it was desirable to invert the tube so that its 
ends would ‘turn downward. When this op- 
eration was first considered it also appeared 
to invite a more appropriate movement of the 
pedals than does the original display. The 
present investigation was undertaken because 


' This study was conducted for the Stanley Avia- 
tion Company, Buffalo, New York in conjunction 
with the U. S. Navy Special Devices Center, contract 
number N9ONR-93103. 


no information was available about the popu- 
lation stereotypes involved. The question to 
be answered is: How frequently will untrained 
subjects push a pedal corresponding to the 
displacement of the indicator? 


Method 
Apparatus 


A mock-up instrument panel was constructed and 
mounted on a supporting structure which contained 
foot pedals at a convenient height for a subject (S) 
to operate while sitting in front of the display. The 
two forms of the display were mounted side by side 
with concealing screens. The indicators were pre- 
pared with standard bezels and provided reasonable 
facsimiles of regular instruments. The moving ball 
was controlled by a knob at the rear. The experi- 
menter could set the ball at any desired position and 
move it back and forth in demonstrating the display 
for the subject. 


Procedure 


The necessity for obtaining unbiased reactions to 
both indicators with the ball set either to the left or 
to the right of center for each display involved a 
slight complication of experimental design. It was 
necessary for half the Ss to react to one display first 
and to the other second, and for the other half to 
reverse this process. In addition, it was considered 
that the position of the ball (ie., left or right of 
center) in the first display might influence the Ss in 
some fashion in judging the second display, and so 
the ball position had to be retained for half of the 
Ss and reversed for the other half. The general ar- 
rangement of the experimental design can be inferred 
from Table 1 where the results are presented. 


Subjects 


Sixty-four young male (ages 18-24) college stu- 
dents, volunteers from psychology classes, naive with 
respect to aviation, were used. Each was tested in- 
dividually and was instructed to react with which- 
ever foot he thought proper in order to center the 
index. He was told that-the index would be shown 
either to the left or right of center. He was in- 
formed that the test was an inquiry to find out what 
was the natural thing to do, that there were no 
tricks, that intelligence was not involved. As well 
as could be determined by inquiry, the Ss did what 
they were asked to do and did not try to outwit the 


422 





Population Stereotypes in Pedal Control 


experimenter. Intensive interrogation of some Ss 
could not shake their decisions. They insisted that 
their responses were “natural.” 


Results 


All foot responses were scored as ‘‘corre- 
sponding” if the S pressed the pedal on the 
side shown by the indicator. All responses 
of the foot opposite the indicator were scored 
as noncorresponding. 

From Table 1 it is apparent that a corre- 
sponding choice of pedal was made by only 9 
Ss out of the 32 who reacted to the standard 
display first. This figure does not even ap- 
proach the chance expectancy of 16, and the 
deviation from chance is such as to occur less 
than twice in one hundred (chi-square p = 
less than .02). If we add the choices of the 


Table 1 
Choices Reflecting the Population Stereotype in 
Selecting the Pedal Motion to Adjust Two 
Forms of a “‘Ball-Bank”’ Display. Only 
“Corresponding” Choices are Shown 


No. 
Form of Display N Corresponding 
Standard Display Shown First 
Index on right 16 
Index on left 16 


Standard Display Shown Second 
Index on right* 
Index on left* 
Index on right** 
Index on left** 


Totals 


Inverted Display Shown First 
Index on right 16 
Index on left 16 


Inverted Display Shown Second 
Index on right* 
Index on left* 
Index on right** 
Index on left** 


Totals 19 


* These settings corresponded to the same direction 
of displacement as presented on the first display 
shown. ‘ 

** These settings were opposite to those shown’ in 
the first display. 


423 


other 32 Ss who viewed the standard display 
after the inverted one, we find no different 
result; in fact, again only 9 Ss chose the cor- 
responding pedal. Combining the results for 
the 64 Ss on the standard display, we find 
then, only 18 correct choices where 32 could 
be expected to occur by chance. Such a de- 
viation could be expected to occur less than 
once in 100 choices (chi-square p = less than 
.01). In summary, the great majority of Ss 
elected to use the noncorresponding pedal 
when viewing the standard display. 

The findings with the inverted display 
proved no better. Again a small minority 
made the corresponding choice whether this 
dial was displayed first or second. Only 19 
out of 64 Ss chose the corresponding pedal 
(p = less than .01). Further examination of 
the table shows that the Ss were not inclined 
to select the corresponding pedal regardless 
of whether the index was displayed with a 
left or right deviation. 


Discussion 


It is obvious from the data that the “natu- 
ral” response to the control-display relation- 
ship studied here is to press a pedal on the 
side opposite that of the indicator displace- 
ment. When the indicator is on the right, 
Ss press with the left foot and vice versa. 
This is true whether the display form is of 
the standard or inverted variety. An inter- 
esting implication is the finding that the 
standard display currently used in aircraft 
gives an indication that normally invites the 
use of the foot opposite that of the index 
deviation. Because such a reaction is, in ef- 
fect, an error, it can be concluded that the 
standard display operates in a fashion that 
opposes the population stereotype. The 
“natural” response appears to be to use the 
“wrong” foot. Because so many pilots have 
already learned to perform an “unnatural” 
response, a recommendation to change the 
display is not necessarily in order. 

In describing their reactions, Ss reported 
some type of hand-foot generalization which 
amounts to a tendency to reach for some- 
thing on the right with the left hand and vice 
versa for centering purposes. Such reports 





424 


suggest a general tendency to pull things over 
to a side rather than to push them over. 
Some such action appears to operate in ma- 
nipulating steering wheels, for example, where 
the driver appears to pull the wheel in the di- 
rection of the desired turn rather than push 
it. Further investigation is required if the 
suggestions just made are to be considered 
seriously. 


Summary and Conclusions 


1. Sixty-four young male college students 
were asked to press simulated rudder pedals 
to center ball-type indicators. One indicator 
was a simulated standard ball inclinometer. 
The other was one with the standard tube 
inverted and more angular. 

2. The subjects failed to approach even a 
chance distribution of correct responses in 
using either instrument. The differences from 


chance are highly significant. 


B. R. Bugelski 


3. It is concluded that the present aircraft 
instrument offers a display which is contrary 
to the population stereotype and invites er- 
roneous responses from untrained subjects. 

4. The reactions of the subjects support 
the conclusion that the population stereotype 
for pedal action in centering off-center indi- 
cators is to use the foot opposite to the direc- 
tion of displacement. 


Received January 24, 1955. 


References 


. Chapanis, A., Garner, W. R., & Morgan, C. T. 
Applied experimental psychology. New York: 
Wiley, 1949. 

. Fitts, P. M. Engineering psychology and equip- 
ment design. In S. S. Stevens (Ed.), Hand- 
book of experimental psychology. New York: 
Wiley, 1951. 

. Warrick, M. J. Direction of motor preferences in 
positioning visual indicators by means of con- 
trol knobs. Amer. Psychologist, 1947, 2, 345. 
(Abstract) 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Response Preferences in Display-Control Relationships * 


Sherman Ross, B. E. Shepp, and T. G. Andrews 


University of Maryland 


Recent interest in the field of applied ex- 
perimental psychology has been centered on 
the characteristics of display-control relation- 
ships (1, 6, 7, 8,9, 10,11). This study is an 
exploration of the usefulness of a paper-and- 
pencil technique in the analysis of display- 
control relationships. The results of War- 
rick (11) have indicated the value of a 
technique. The recent review by Andreas 
and Weiss (2) has described the pertinent 
studies. We shall consider only those in- 
vestigations of immediate concern. Warrick 
(10) was interested in the manner in which 
Ss adjusted a rotary control knob in relation 
to a row of lights which constituted the dis- 
play. He found that when controls were 
mounted on the same plane as the display, 
turning the control in a clockwise direction 
to move the indicator to the right seemed to 
be most “natural” for the S. To move the 
indicator to the left, the fitting response was 
a countercleckwise adjustment. When the 
control was placed on a plane perpendicular 
to the plane of the display, responses were 
more variable. In another study, Warrick 
(11) reported that relationships in which the 
control was on the same plane as that of the 
display were superior in terms of perform- 
ance. In a study involving a semicircular 
cluster of lights as the display, performance 
was less variable when the relationship be- 
tween the display and the control was “ex- 
pected” or “natural.” 

Similar studies by Mitchell and Vince (7) 
have demonstrated that Ss prefer relation- 
ships of an expected nature. A study by 
Norris and Spragg (8) has indicated that 
performance in a two-hand coordination task 
is better if there is “continuity” between the 
plane and direction of movement of the con- 
trol and the plane and the direction of move- 
ment of the display. Carter and Murray (5) 


1 We should like to express our thanks to R. C. 
Hackman, University of Maryland, for aid on the 
analysis of the data and review of the report. 


have reported the effects of certain display- 
control relationships in which signal move- 
ment on a scope face was related to the verti- 
cal or horizontal rotation of the control knob. 
The most efficient relationship occurs when 
the knob rotating vertically controls vertical 
movement of the signal, and the knob ro- 
tating horizontally should control horizontal 
movement of the signal. 

These studies emphasize that S brings to a 
display-control situation certain habits which 
manifest themselves as response preferences; 
such preferences might be modified by train- 
ing. Under military or industrial conditions, 
however, we are concerned that S will not 
revert to his older highly reinforced “natural” 
habit, along the lines of Jost’s law. Vince 
(9) has studied the effects of “expected” 
and “unexpected” directional relationships be- 
tween display and control and the effects of 
situational stress on performance. The stresses 
involved an increased rate of task perform- 
ance, distracting stimuli, and a secondary 
task. Performance with the expected dis- 
play-control relationship was better than with 


Fic. 1. A composite showing the signal, the three- 
control devices and their directions of movement, 
and the three planes upon which the devices ap- 
peared. The push-pull control is shown on the 
horizontal plane “1,” the rotary control on the 
frontal plane “2,” and the lever on the vertical- 
lateral plane “3.” 


425 





426 


the unexpected relationship. The effect of 
increased rate was not great, and the distrac- 
tions and secondary task had about the same 
detrimental effect on performance. 

Our present interest is in the highly rein- 
forced habits and expectations which S has 
brought with him to the display-control situa- 
tion. What are his expectations of the rela- 
tionships between the display and the con- 
trol? We have tried to assay the frequency 
and magnitude of the preferences in a paper- 
and-pencil test using three types of controls 
in three different planes to achieve a fixed 
change in an hypotketical scope signal. 


Method and Procedure 
Subjects 


A total of 679 Ss were used. Of these, 224 were 
male students enrolled in the AFROTC program at 
the University of Maryland. The remainder of the 
group were undergraduate students at the Univer- 
sity (210 males, 245 females) secured from larger 
sections of Psychology and Bacteriology. 


‘tion was used (see Fig. 1). 


Sherman Ross, B. E. Shepp, and T. G. Andrews 


Procedure 


A simple paper-and-pencil display-control situa- 
The display was a circle 
with a dot in the center to represent a small light 
signal on an imaginary scope face. Three different 
controls were used: (A) rotary control, (B) push- 
pull control, and (C) lever control. Three different 
planes were selected for presenting each control. 
The S was told to indicate the appropriate adjust- 
ment of the control, so that the signal would move 
in a given direction in each variation. The signal 
movement called for was: right, left, up, or down 
The plane of movement of the display was fixed in 
the frontal plane. The Ss were instructed about the 
possible movements for each of the controls. The 
rotary control knob could be moved either clock- 
wise or counterclockwise. The push-pull control 
could be moved either in towards the plane on which 
it was mounted or pulled out from it. The lever 
control could be moved in any one of four direc- 
tions: right, left, up, and down. These directions 
were placed on the test sheet, and arrows indicated 
the direction in which the controls could be moved. 
The S circled the arrow corresponding to the po- 
sition he would adjust the control to achieve the 


desired change in the signal. The S marked his re- 


Table 1 


Analysis of Responses Made for Different Controls, Instructions, and Planes 








Number and Proportion of Responses 





Plane 3 


Plane 2 


Instruc- 


tion 
Right 
Left 
Up 
Down 


Control 


Rotary 


Right 
Left 
Up 


Down 


Push-Pull 


Right 
Left 


Up 
Down 


Clockwise 
34 (.71) 
18 (.35) 
31 (.62) 
20 (.36) 


In 
32 (.57) 
27 (.49) 
41 (.79) 
24 (.36) 

Right 
29 (.71) 
11 (.28) 


Up 


19 (.50) 
16 (.47) 


Counter- 


clockwise 


33 (.65)* 
19 (.38) 
36 (.64)* 


Out 


14 (.29)** 


44 (.79) 
i2 (.21) 
31 (.57) 
20 (.36) 


In 


Counter 
Clockwise clockwise 


2(2a""" 
44 (.79)*** 
23 (.43) 
36 (.64)* 


Out 





24 (.43) 
28 (.51) 


it (2a"" 


43 (.64)* 


Left 


12 (.29)** 
29 (.72)** 


Down 


19 (.50) 
18 (.53) 


33 (.79) 
29 (.59) 
41 (.73) 
26 (.47) 


36 (.71) 
13 (.75) 


34 (.74) 
21 (.42) 


9 (21)"* 


41 (.41) 
13 (27)""" 


15 (.29)** 


40 (.25)*** 


Down 


12 (.26)*** 
"29 (.58) 





* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 
*** Significant at the .001 level of confidence. 


Counter- 
clockwise 
13: (.27)"" 
35 (.64)* 

2 (227""" 
sé (75° 


Clockwise 
35 (.73) 
20 (.36) 
35 (.78) 
14 (.27) 


In Out 
18 (.42) 25 (.58) 
38 (.69) 7 Ga)" 
32 (.55) 26 (.45) 

13 (.30)** 


Left 


35 (.69) 16 (.31)** 
24 (.56) 31 (.44) 


Down 


19 (.65) 36 (.35)* 
26 (.49) 27 (.51) 





Response Preferences in Display-Control Relationships 


sponse to only three test items. Each test item con- 
sisted of a schematic isometric projection similar to 
Fig. 1, except that only one control on one plane and 
a single desired direction oi signal movement were 
involved. Thus, we limited our data to the initial 
reactions of the Ss to the situation, and secured 
three responses only from each S. The test items 
were arranged to randomize the presentations of the 
control devices, the planes, and the desired direction 
of signal movement. The tests were group adminis- 
tered. Each group was given general instructions, 
and each test sheet had specific instructions. S was 
allowed 2 min. to complete the three responses. 


Results and Discussion 


In Table 1 we have presented the general 
results of the study. The table shows for 
each control device and each plane the fre- 
quency and the proportion with which Ss in- 
dicated one of the available movements of 
the control as the appropriate one to achieve 
the desired movement of the signal. It may 
be noted that the number of Ss responding to 
each arrangement varies slightly. This was 
due to the fact that certain Ss failed to com- 
plete some or all of the choices. It was our 
purpose to determine whether or not any 
preferences existed for any of the available 
responses as plane and control were changed. 
For Control A (Rotary) and Control B 

* (Push-pull) the ¢ test for proportions was 
used to demonstrate statistical significance 
between responses. These results are also 
shown in the table. Control C (Lever) had 
four permissible responses. The Ss, how- 
ever, regarded two pairs—right-left, and up- 
down—as being mutually exclusive, with only 
a few exceptions. 

Some conclusions may be drawn at this 
point from an examination of the table. All 
of the display-control relationships for Plane 
1 are significant (p < .05), except for Ro- 
tary, up; Push-Pull, right and left; and 
Lever, up and down. For Plane 2, all rela- 
tionships are significant except for Push-Pull, 
left and down; Rotary, up; and Lever, down. 
For Plane 3, the exceptions are Push-Pull, 
right and down; and Lever, left and down. 

The results shown in Table 1 clearly per- 
mit a conclusion that for particular control- 
display instruction combinations there exist 
response preferences, demonstrated in this pa- 


427 


per-and-pencil test, and similar to those re- 
ported by other workers. 

In the foregoing statements we mentioned 
that such terms as “natural” and “expected” 
were frequently used in this context. The 
term “continuity” has been employed. A 
study of the literature reveals that these 
terms all refer to the same relationships. In 
our laboratory (3, 4) we have used the term 
“congruency” to describe these relationships. 

One could argue that congruency between 
the direction of movement of controls and di- 
rection of movement of the display exists if 
there is a relationship of identity. For ex- 
ample, in the studies cited earlier, a clock- 
wise turn of the rotary control knob tended 
to be associated with a right movement of the 
display indicator. This kind of relationship 
held for other directional movements also. 

The importance of the plane on which the 
control is mounted is not to be overlooked. 
For example, the small dot of our display 
“moves” along either the horizontal or verti- 
cal axis. The relationship may be modified 
by changing the control plane. For example, 
when the push-pull control is located on 
Plane 1, we find S’s preferences to lie in the 
vertical axis. Considering Plane 2 for push- 
pull control, there is to be expected no pref- 
erence for either axis. However, when the 
instruction right or up is given, the choices 
are significantly different. There is a possi- 
bility that the findings we report are mainly 
a property of the particular test patterns em- 
ployed. The particular details shown in Fig. 
1 may have biased S to make certain choices. 
As the next phase of our program we propose 
to investigate the preference patterns of Ss 
using actual apparatus. Then we propose to 
investigate the effects of stress. 

For the moment, we can conclude that at 
least two factors are important where con- 
gruency is concerned: (a) similar or identical 
directional movement relationships between 
the display and the control, and (6) the 
movement of the control in its plane with re- 
spect to the directional axis of the display 
within the display plane. The importance of 
further investigation is clear. The advan- 
tages of describing display-control situations 





428 Sherman Ross, B. E. Shepp, 


as to whether they conform to Ss’ expectation 
patterns is obvious. 


Summary 


The experiment was directed at the deter- 
mination of response preferences linder varied 
conditions. Using a group-administered pa- 
per-and-pencil test, three responses from each 
of 679 Ss were obtained for three different 
control devices (rotary knob, push-pull, and 
lever) which were arranged on three differ- 
ent planes. The display was held constant 
and responses were obtained for desired sig- 
nal movements of right, left, up, and down. 

The results indicate that response prefer- 
ences do exist. These preferences are found 
under certain conditions, and vary with the 
control, the plane, and the desired signal 
movement. 


Received February 17, 1955. 


References 


. Andreas, B. G._ Bibliography of perceptual- 
motor performance under varied display-con- 
trol relationships. Rochester, N. Y.: Univer. 
of Rochester, 1953. (Sci. Rep. No. 1, Con- 
tract AF 30 (602)-200.) Pp. 1-17. 

. Andreas, B. G., & Weiss, B. Review of research 
on perceptual-motor performance under varied 
display-control relationships. Rochester, N. Y.: 
Univer. of Rochester, 1954. (Sci. Rep. No. 2, 
Contract AF 30 (602)-200.) Pp. 1-117. 

. Andrews, T. G., & Ross, S. Summary report on 
studies of behavioral efficiency. College Park, 


and T. G. Andrews 


Md.: Univer. of Maryland, 1955. (Project 
No. DA-49-007-MD-222 (O.I. 19-52).) 


. Bowen, J. H. Effects of preliminary types of 


training on subsequent discriminative-motor 
learning. Unpublished doctor’s dissertation, 
Univer. of Maryland, 1955. 


. Carter, L. F., & Murray, N. L. A study of the 


most effective relationships between selected 
control and indicator movements. In P. M. 
Fitts (Ed.), Psychological research on equip- 
ment design. Washington: U. S. Government 
Printing Off., 1947. (AAF Aviat. Psychol. 
Program Res. Rep.) 


. Human engineering; a selected bibliography and 


a guide to the literature. Compiled by Ref- 
erence Section, ASTIA Reference Section, Li- 
brary of Congress, Washington, D. C., Au- 
gust, 1953. Pp. 1-35. 


. Mitchell, M. J. H., & Vince, M. A. The direc- 


tion of movement of machine controls. Quart. 
J. exp. Psychol., 1951, 3, 24-35. 


8. Norris, Eugenia B., & Spragg, S. D. S. Studies 


in complex coordination. Performance on the 
two-hand coordinator as a function of the 
relations between direction of rotation of con- 
trols and direction of movement of display. 
J. Psychol., 1953, 35, 119-129. 


. Vince, Margaret A. Learning and retention of 


an “unexpected” control-display relationship 
under stress conditions. Med. Res. Council, 
Appl. Psychol. Unit, Psychol. Lab., Cam- 
bridge, Eng. APU 125/50, 1950. 


. Warrick, M. J. Direction of movement prefer- 


ence in positioning visual indicators by means 
of control knobs. Amer. Psychologist, 1947, 
2, 345. (Abstract) 


. Warrick, M. J. Direction of motion stereotypes 


in positioning a visual indicator by use of a 
control knob. II. Results from a printed test. 
USAF, Eng. Div. AMC (Memo report No. 
MCREXD-694-19a), October 28, 1948. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Stroke Width, Illumination Level, and Figure-Ground 
Contrast in Numeral Visibility * 


Robert S. Soar 


Vanderbilt University 


The problem of how to design letters and 
numerals which will be readA4nost rapidly and 
accurately has received attention in the past 
with reference to such uses as highway signs, 
automobile license plates, and timetables. Re- 
search is currently being conducted by the 
Armed Forces to improve the visibility of 
numerals on instrument dials, plotting boards 
in filter centers, and other equipment. Among 
the aspects of numeral design which have re- 
ceived attention in a number of studies has 
been the factor of the width of stroke, or 
boldness with which the symbol is drawn, and 
factors which might be expected to influence 
the stroke width which would be optimal in 
a particular situation. The results of some 
of these studies are in conflict. 

Three studies (5, 12, 13) have reported 
data suggesting an interaction of illumination 
level and stroke width, but have not tested it. 
The first used a range of illumination and 
brightness from 80 foot-candles to .35 milli- 
lamberts; in the second and third, brightness 
levels were ordered, but not quantified. Other 
studies (11, 14) which have tested the inter- 
action have not found it to exist or have 
found it to be of minor importance. The first 
used brightnesses ranging from 3 to 31 foot- 
lamberts; the second, reported in abstract, 
does not specify the brightnesses employed. 
Insofar as the values have been reported (5, 
11), interaction effects appear only for the 
_wider range of brightnesses. 

Studies dealing with differing modes of fig- 
ure-ground contrast (white on black or black 
on white) have also resulted in contradictory 
conclusions. One investigator (4), in a study 
typically cited as the classic one of all nu- 
meral visibility studies, concluded that white 
on black was more visible than black on 


1This research was supported in part by funds 
made available by Vanderbilt University. The au- 
thor is solely responsible for the statements made in 
this report. 


white and that it required a different stroke 
width for optimal visibility; but no statistical 
tests were reported, and the data themselves 
render the conclusion doubtful. Observations 
were made outside, in the morning, under 
diffuse daylight. Another study (7) sup- 
ports the conclusion that white on black is 
more visible, presumably with brightnesses 
ranging from .0001 to approximately .02 foot- 
lambert. Still others conclude variously that 
(a) the two presentations do not differ in 
visibility, nor in optimal stroke width (11), 
or that (6) black on white is superior for 
letters in standard print forms (9, 15). 

If stroke width and/or mode of figure- 
ground contrast interact with illumination 
level, then differences in illumination level 
from study to study may account for the dif- 
ferences in results. 

As a practical matter, the illumination lev- 
els under which it may be important to read 
numerals on signs or as identifying symbols 
on vehicles or craft vary from a maximum of 
about 10,000 foot-candles (3) (the illumina- 
tion of the noon sun in midsummer) to al- 
most no light at all. As examples of low lev- 
els of illumination which are the maxima for 
particular situations, the Armed Forces NRC 
Vision Committee (1) recommends 0.1 foot- 
lambert as representative of night lighting on 
instruments, and Craik (7) found 0.001 foot- 
lambert to be the highest brightness level 
usable for night lighting of instruments with- 
out creating afterimages which delayed identi- 
fication of silhouettes in a simulated night sky. 
This range of illumination is great enough 
that an interaction of illumination level with 
either of the other variables would be likely 
to be practically important, and much greater 
than any of the experiments reported have 
studied. 

Problem 


The questions, then, to which answers are 
needed are: 


429 





430 Robert 
1. Does stroke width interact with illumi- 
nation level in determining numeral visibility? 
2. Does stroke width interact with mode of 
figure-ground contrast? 
3. Does illumination level 
mode of figure-ground contrast? 
4. Do all three of these variables interact 
with each other? 


interact with 


Method 


The stimulus numerals~ were hand-drawn in large 
size and reduced photographically to about the size 
of 10-point type, both black on white and white on 
black, and in stroke width to height ratios of 1:4 
and 1:16. These stroke widths represent the ap- 
proximate limits of the optimal range found in other 
studies. The forms of the numbers followed those 
found to be optimal by Brown et al. (5) and veri- 
fied by Atkinson et al. (2). 

Eight experimental conditions were studied, all 
combinations of three variables at two levels each. 
These were stroke width and mode of figure-ground 
contrast as indicated above, and illumination at .5 
foot-candles and at 500 foot-candles. 

Forty college student subjects were used, all with 
100% normal visual acuity. The procedure followed 
was a variant of the method of constant stimuli, in 
which each subject observed the stimulus numbers 
at three different distances under one of the com- 
binations of experimental conditions. The score for 
each subject was an estimated threshold distance at 
which he would have been expected to read half of 
the numbers correctly. 

Since this is, in general, a procedure which has 
been cited before, but which has not been described 
in sufficient detail to make duplication at all certain, 
the procedure followed here has been described in 
detail and deposited with ADI.? 


Results and Discussion 


The procedure by which the data were sum- 
marized, as indicated above, reduced all the 
scores to a common scale of measurement, 
namely, the distance at which they could be 
read to a standard criterion of accuracy. 

The mean distances at which the numerals 
were read under each of the experimental con- 
ditions is shown in Table 1. 

Analysis of the data was carried out by 
analysis of variance using a procedure out- 
lined by Edwards (8). The assumptions of 

2 Order Document No. 4682 from ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington 25, D. C., remitting 
in advance $1.25 for microfilm or $1.25 for photo- 


copies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 


S. Soar 


Table 1 


Mean Estimated Distance* at Which Numerals 
Would Have Been Read to a 50% 
Criterion of Accuracy 


Black on White 


White on Black 
Ilumi- 
nation 
Level 

500 ft-c. 
5 ft-c. 


1:4 
3.17 
Lz@ 


* In meters. 


homogeneity of variances and normality of 
distribution required of the data by the 
analysis were examined by probit analysis as 
outlined by Johnson (10). The data showed 
a tendency toward heterogeneous variances 
in that both means and variances were larger 
for the higher illumination conditions than 
for the low; accordingly, a square-root trans- 
formation was applied to the data. This 
rendered the variances more homogeneous, 
although the possibility of heterogeneity re- 
mained, to be taken into account in interpret- 
ing the analysis. The results of the analysis 
are shown in Table 2. 

Neither stroke width nor mode of figure- 
ground contrast influenced visibility. The re- 
sults for stroke width are a confirmation of 
earlier studies, since the values selected were 
the extremes of the optimal range found by 
others. The finding that the white and black 
numbers used here did not differ in visibility 
cannot be generalized because of the inter- 
actions discussed below. 

Illumination level showed a highly signifi- 
cant influence on numeral visibility as would 
be expected. 

The interaction of stroke width and illumi- 
nation level was significant beyond the one 
per cent level in the direction of wider stroke 
widths being optimal under low illumination. 

The interaction of stroke width and mode 
of figure-ground contrast was significant at 
the one-tenth of one per cent level. White 
numbers were optimally visible with a nar- 
rower stroke width than black. 

[llumination level did not 
mode of figure-ground contrast. 

The interaction of all three variables was 
significant beyond the one per cent level. 


interact with 





Stroke Width, Illumination,..and Figure-Ground Contrast 


Table 2 


Analysis of Variance of Numeral Visibility Under Eight Experimental Conditions 


Sums of 
Squares 


Mean 
Square 
.0016 
1.8300 


Degrees of 
Source of Variation Freedom 
. Between stroke widths 
. Between illumination levels 
*. Between black on white and 
white on black 
-AXB 
EE AXC 
a ee 
i AXBXC 
Error 
Total 


.0016 1 
1.8300 1 


0094 
1437 
1602 
0070 
1220 
3689 
2.6429 


.0094 
1437 
.1602 
.0070 
1220 
01153 


* po = 7.56; p.oor = 13.29. 


In general terms, then, mode of figure- 
ground contrast and illumination level inter- 
act with stroke width, but do not interact 
with each other. These findings afford a 
reconciliation of the divergent results with re- 
spect to whether illumination alters optimal 
stroke width. When it is studied over a wide 
range—but no wider range than the practical 
situation presents—it does. And the two in- 
teractions make clear that the problem of 
whether white on black is more visible than 
black on white can be settled only if the 
optimal stroke width is determined separately 
for each, and for the illumination under which 
the comparison is to be made. These results 
should generalize to other illumination levels, 
however, provided the optimal stroke width is 
determined for that illumination level. Since 
there is no interaction between illumination 
level and mode of figure-ground contrast, the 
significant interaction of all three variables 
would seem to take place because of the in- 
teraction of stroke width with each of the 
other variables. Illumination level itself 
within the limits of the study does not alter 
the relative visibility of the two modes of 
figure-ground contrast. But illumination level 
does alter the optimal stroke width, and mode 
of figure-ground contrast alters the optimal 
stroke width, so that all three are related. 
Both illumination level and mode of figure- 
ground contrast must be known to specify 
the optimal stroke width. 

The statistical significance of all these find- 
ings depends on the degree to which the as- 


sumptions required by the analysis have been 
met. It will be recalled that even the trans- 
formed data showed a tendency toward non- 
normality of distribution and heterogeneous 
variances. Cochran (6) points out that this 
might be expected to alter the significance 
levels obtained to a minor degree, but since 
all of the results were either clearly nonsig- 
nificant, or highly significant, this question 
can safely be ignored. 

These results have implications for both 
research and application. Further research 
should be accompanied by specification of the 
illumination level employed since the results 
obtained at one illumination level will not 
transfer to widely different illumination lev- 
els. Conversely, the optimal stroke width for 
each mode of figure-ground contrast will need 
to be determined for a wide range of illumi- 
nation levels. Application can then be made 
in terms of the level of illumination and mode 
of figure ground which is to exist in practice. 
If the application is subject to a wide range 
of illumination levels, it seems likely that the 
best choice of stroke width would be that 
which would be optimal at the lowest illumi- 
nation level, since visibility would be ex- 
pected to increase more rapidly with rising 
illumination than it would decrease as a func- 
tion of nonoptimal stroke width. 


Summary 


Three variables in numeral visibility, stroke 
width, illumination level, and whether the 
number is white on black or black on white 





432 Robert S. Soar 


were studied. Forty subjects observed, pro- 
viding five replications of eight combinations 
of experimental conditions. The results were 
treated by analysis of variance, with the in- 
teractions the items of most interest. The re- 
sults were these: 


1. Neither stroke width nor mode of figure- 
ground contrast showed a significant influence 
on visibility, but illumination level was highly 
significant. 

2. Stroke width interacted significantly with 
illumination level. 

3. Stroke width interacted significantly with 
mode of figure-ground contrast. 


4. The interaction between illumination 


level and whether the number was black on 
white or white on black was not significant; 
further, there was no suggestion in the data 
that such an interaction exists. 

5. The interaction of all three variables 
was significant. 


Received February 17, 1955. 


References 


1. Armed Forces-NRC Vision Committee. Stand- 
ards to be employed in research on visual dis- 
plays. Washington, D. C.: National Research 
Council, 1950. 

2. Atkinson, W. H., Crumley, L. M., & Willis, 
Marion P. A study of the requirements for 
letters, numbers, and markings to be used on 
trans-illuminated aircraft control panels, Part 
5, The comparative legibility of three fonts 
for numerals. U.S. Naval Air Material Cent., 
1952, No. NAM EL-609. 

3. Barrows, W. E. Light, photometry, and illumi- 
nating engineering. New York: McGraw- 
Hill, 1925. 


4. Berger, C. I. Stroke-width, form and horizontal 
spacing of numerals as determinants of the 
threshold of recognition. J. appl. Psychol., 
1944, 28, 208-231. 

. Brown, F. R., Lowery, E. A., & Willis, Marion 
P. A study of the requirements for letters, 
numbers and markings to be used on trans- 
illuminated aircraft control panels, Part 3, 
The effect of stroke-width and form upon the 
legibility of numerals. U. S. Naval Air Ma- 
terial Cent., 1951, Rep. TED No. NAM EL- 
609. 

. Cochran, W. G. Some consequences when the 
assumptions for the analysis of variance are 
not satisfied. Biometrics, 1947, 3, 22-38. 

. Craik, K. J. W. Instrument lighting for night 
use. Air Ministry Flying Personnel Res. 
Comm. Rep., Lond., 1941, No. FPRC 342. 

. Edwards, A. L. Experimental design in psycho- 
logical research. New York: Rinehart, 1950, 
Ch. 12. 

. Holmes, Grace. The relative legibility of black 
print and white print. J. appl. Psychol., 
1931, 15, 248-251. 

. Johnson, P. O. Statistical methods in research. 
New York: Prentice-Hall, 1949. 

. Kuntz, J. E., & Sleight, R. B. Legibility of 
numerals: The optimal ratio of height to 
width of stroke. Amer. J. Psychol., 1950, 
63, 567-575. 

. Loucks, R. B. Legibility of aircraft instrument 
dials: a further investigation of the relative 
legibility of tachometer dials. USAAF Sch. 
Aviat. Med. Proj. Rep., 1944, Proj. No. 265 
(Rep. No. 2). 

. Loucks, R. B. Legibility of aircraft instrument 
dials: The relative legibility of various climb 
indicator dials and pointers. USAAF Sch. 
Aviat. Med. Proj. Rep., 1944, Proj. No. 286 
(Rep. No. 1). 

. Shapiro, H. B. Factors affecting the legibility 
of digits. Amer. Psychologist, 1951, 6, 364. 
' (Abstract) 

. Taylor, Cornelia D. The relative legibility of 
black and white print. J. educ. Psychol., 
1934, 25, 561-578. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


The Peripheral Viewing of Dials 


John W. Senders 


Aero Medical Laboratory, Wright Air Development Center 
Ilse B. Webb’ 


Antioch College 


and Charles A. Baker 


Aero Medical Laboratory, Wright Air Dévelopment Center 


The long series of studies of pilot eye move- 
ments conducted at the Psychology Branch 
of the Aero Medical Laboratory (1) has 
shown that the amount of time spent by 
pilots in the reading of any particular instru- 
ment in flight is much shorter than the time 
taken by Ss in a laboratory situation. The 
laboratory studies have used tachistoscopic 
exposure of dials in which the Ss controlled 
the exposure duration. The pilot eye fixa- 
tion studies recorded the time that the pilots 
were fixated on a given instrument in actual 
flight conditions. One explanation which has 
been advanced (2) to account for this dis- 
crepancy is that the pilot has been able to 
develop, on the basis of previous experience, 
a set of expectancies about the indication on 
an instrument at any specified time. The S 
in the experimental situation, on the other 
hand, is presented with randomly selected 
and independent pointer settings, and is un- 
able, because of the very design of the ex- 
periment, to formulate any hypotheses or ex- 
pectancies. Once an expectancy has been 
formulated, the principal function of a fixa- 
tion on the instrument may be merely to con- 
firm or reject it—a function which can be 
accomplished far more quickly than a quan- 
titative reading. . Furthermore, there is ex- 
perimental confirmation of the fact that ex- 
pectancies do reduce errors in instrument 
reading (2). 

In the operational situation, instruments re- 
main continuously present in front of and 
around the operator even though he is not 
fixating them. If the images of these instru- 
ments, falling on the periphery of the retina, 
convey some information, then the range of 


1 Now at The Ohio State University. 


possible alternative pointer readings is re- 
duced and expectancies are formulated. This 
study is designed to determine the relation- 
ship between the efficiency with which a dial 
can be used, and the extent of the displace- 
ment of the image of that dial from the fovea. 


Apparatus 


The apparatus consisted of a modified perimeter 
upon which simulated dials could be presented. 
Starting at a lateral displacement of 10°, the per- 
imeter was marked in 10° intervals to a lateral dis- 
placement of 80°. A stimulus card holder, contain- 
ing a simulated dial face, could be placed upon any 
one of these marked intervals. A red point of light 
of low intensity was used for a fixation point. The 
Ss were provided with a headrest and chin support, 
and with a pistol-grip switch which turned on a 
light to illuminate the dial. This switch in turn 
controlled a 1/100-sec. clock, which recorded the 
time during which the dial was illuminated. 

The stimuli consisted of white circles painted upon 
black cardboard octagons. Within each circle was 
a white pointer, the position of which could be 
changed by a rotation of the octagon. No gradua- 
tion marks or numbers appeared on these simulated 
dial faces. Four different pointer designs were used; 
three were conventional pointers whose widths were 
05, .1, and .2 in. The fourth pointer design had the 
same area as the .1-in. pointer, but tapered uniformly 
to the tip—i.e., it was double width at the broad 
end and effectively zero width at the top. At a 
28-in. viewing distance, the dial subtended a viewing 
angle of 7.5°. When illuminated, the white circle 
and pointer had a luminance of .5 foot-lamberts. 

Results from four Ss, experienced in psychological 
experimentation, are included in the data analysis. 
Data from one S, who was found during the experi- 
ment to have anomalous peripheral vision, were ex- 
cluded. 


Procedure 


The S was seated at the apparatus and fixated the 
red fixation point. Monocular vision was used 
throughout. When the stimulus card was placed on 


433 : 





434 


the right side of the perimeter, the right eye was 
used, and when on the left side of the perimeter, the 
left eye was used. In almost total darkness the E 
placed the stimulus card upon one of the marked 
positions of the perimeter. He then gave a “ready” 
signal, and the S actuated the pistol-grip light switch 
which illuminated the dial and started the clock. 
When he had read, or thought he had read, the 
pointer position, he released the switch, which turned 
off the light and stopped the clock. The S’s verbal 
report was one of eight compass headings—“north,” 
“northeast,” etc. Each dial was presented to each S 
64 times—once at each combination of angular dis- 
placement of the stimulus and position on the 
pointer. Since four pointer designs were used, each 
S made a total of 256 readings. The order of pres- 


John W. Senders, Ilse B. Webb, and Charles A. Baker 


entation of the stimuli was determined in advance 
by random number tables. 


Results 


Results are analyzed in terms of time and 
errors. An error was any response other than 
the correct one, but two kinds of errors were 
tabulated separately. These were (a) a re- 
versal error, which is reported pointer posi- 
tion differing from the correct one by 180°— 
e.g., a report of “northeast” for “southwest,” 
and (6) any other error. Pointer sizes, of 
the range used, were not significantly differ- 





ERRORS-ALL TYPES-IN % 





TIME IN SECONDS 








v 
10 20 30 40 
DEGREES NASAL TO FOVEA 


Fic. 1. 


50 60 70 80 


Time and total errors in estimation of pointer position as a function of the 


peripheral angle of view. 





Peripheral Viewing of Dials 


PROPORTION OF ALL RESPONSES 


30 40 


CORRECT RESPONSES 


REVERSALS 


ERRORS 


50 60 70 80 


DEGREES NASAL TO FOVEA 


Fic. 2. 


Proportion of reversal errors and true errors in estimating pointer position 


as a function of the peripheral angle of view. 


ent, nor was the one unusual shape different 
from the others, as determined by an analy- 


sis of variance. The data for all pointers 
averaged are presented in Figs. 1 and 2. 


Discussion 
The functions plotted in Fig. 1 of time and 
total errors are linearly related to angle of 
displacement with the exception of the high 
values attributable to the blind spot. Of 
course, the error function must eventually 
reach and stay at 875% (pure chance), and 


the time function becomes indeterminate de- 
pending only on the willingness of S’s to 
make guesses quickly or slowly. 

Figure 2 shows the three-way breakdown 
of responses described above. It is evident 
that nonreversal errors remain less than 2% 
until the peripheral angle of view is greater 
than 40°. Reversal errors, on the other hand, 
comprise approximately 20% of the responses 
at 30° and 37% at 40°. 

The findings of this study have application 
in the design and arrangement of instruments 





436 


on an instrument panel. If one is concerned 
with instruments in which the pointer move- 
ment is limited to less than 180° or the rate 
of change is slow, then an observer can dis- 
criminate among settings which differ by 45° 
almost perfectly even when the instrument is 
displayed as much as 40° from the line of 
sight. It should be noted that even at 80°, 
over twice as many responses are correct as 
would be predicted on a chance basis. In a 
situation where an operator is continuously 
tracking a particular instrument but also has 
a monitoring task of other instruments, such 
monitoring would be possible with minimum 
time demands if these instruments are of the 
moving-pointer type and if they are within 
40° of the primary instrument. If the in- 
strument-pointer alignment principles were 
employed (e.g., all pointers aligned at 9 
o'clock for desired values), an operator can 
probably fixate a single display and simul- 
taneously monitor other instruments periph- 
erally up to 40° from the central field. 


John W. Senders, Ilse B. Webb, and Charles A. Baker 


Summary 


The ability of Ss to see pointer position for 
four types of pointers at peripheral angles 
from 10° to 80° has been investigated. No 
significant differences exist among the point- 
ers used in this experiment. If reversal errors 
are ignored, the ability to discriminate pointer 
position when the dial is displaced as much 
as 40° from the fixation point is good. Even 
at 80° of displacement pointer position read- 
ings are better than chance. 


Received March 11, 1955. 


References 


1. Milton, J. L., McIntosh, B. B., & Cole, E. L. 
Eye fixations of aircraft pilots: IX. Routine 
maneuvers under day and night conditions, 
using an experimental panel arrangement. 
USAF, WADC Tech. Rep., 1954, No. 53-220. 

. Senders, Virginia L. The affects of absolute and 
conditional probability distributions of instru- 
ment settings on scale reading. I. Repeated 
exposures to the same setting. USAF, WADC 
Tech. Rep., 1954, No. 54-253. 





The Journal oj Applied Psychology 
Vol. 39, No. 6, 1955 


Relative Effectiveness of Two Standard Color-Vision Tests ' 


George L. De Nittis * 


Fordham University 


Since the occurrence of a railway disaster 
in Sweden in 1875, which brought about the 
first systematic attempt to test for the phe- 
nomenon of color blindness, an increasing 
amount of time has been devoted to the de- 
tection of this deficiency. Of the many types 
of tests evolved and used, the one that has 
been most accepted has been the pseudo- 
isochromatic type. Its popularity is predi- 
cated on its compactness, ease and quickness 
of administration, interest to the subject, 
availability, and empirical nature. 

This acceptance, however, is not without 
certain reservations. There are many objec- 
tions to this type of test, particularly the 
uncontrolled nature of the testing situation, es- 
pecially with regard to the type of illumina- 
tion under which the test is administered. 
This specific objection, coupled with the need 
to assure reliability and validity for a test or 
battery of tests of this type, has led to in- 
tensive research in the areas of comparison 
of color-vision tests, diagnostic value of the 
tests, and development of new modifications. 

Of particular interest has been the research 
done with a view to determine the possible 
effects that level and type of illumination 
have on the perception of colors in the plates. 
With a light source of tungsten the percent- 
age of total energy yielded by various wave 
lengths within the visible spectrum depends 
upon the temperature of the filament. The 
lower the temperature the greater is the rela- 
tive energy concentrated in the red region of 
the spectrum. As the temperature of the fila- 
ment is increased, the relative contribution of 
the red region becomes less and that of the 
blue region becomes greater. The relative en- 
ergy values for various wave-lengths of light 

1 This study is based on a dissertation presented to 
the Faculty of the Graduate School of Fordham 
University in partial fulfillment of the requirements 
for the degree of Master of Arts. 

2The author wishes to express his gratitude to 
Professor Joseph G. Keegan for suggesting the prob- 


lem and to Professor Richard T. Zegers for his tech- 
nical advice. 


at various filament temperatures have been 
experimentally determined and are to be 
found in tables. This method of determina- 
tion of color temperature is according to the 
Kelvin scale. The International Commission 
on Illumination has set up and the National 
Bureau of Standards has accepted certain 
norms in arriving at color temperatures speci- 
fied as “illuminants.” To obtain Illuminant 
A, for example, a calibrated 500-watt bulb 
burning at 2,848° K should be used. 

It is a well-established fact in colorimetry 
that a colored, reflecting surface will vary in 
saturation and hue depending upon the type 
of illumination which falls upon it. Because 
of this, pseudo-isochromatic plates have been 
selected and interpreted on the basis of stand- 
ard illumination. However, in spite of the 
importance of this factor, it has been the rule 
rather than the exception to administer these 
tests under nonstandard illumination. With 
the onset of World War II and the need to 
test validly the color vision of large numbers 
of the population, this problem assumed vital 
importance and impetus was given to research 
in this particular area. 

Two studies done by Hardy, Rand, and 
Rittler (3, 4) with the Ishihara test reveal 
that, in general, a higher percentage of those 
having defective color vision pass the test 
when tungsten illumination (A) instead of 
daylight (C) is used. From these studies 
and others conducted by outstanding workers 
in the field one may conclude: that type of 
illumination is of prime importance in the 
use of pseudo-isochromatic plates, if consist- 
ent and valid results are to be obtained; that 
use of nonstandard or yellowish illumination 
will aid some color defectives, particularly 
deuteranopes, to pass the test; and that in- 
structions should accompany these tests speci- 
fying the proper use of standard illumination. 

In view of this illumination factor, it was 
perhaps inevitable that an investigator would 
develop a test of a kind intended to be inde- 


437 





438 


pendent of type of illumination. In 1948, 
Freeman and Zaccaria (1) published an 
evaluation of such a test containing 17 plates 
and called the Illuminant-Stable Color Vision 
Test. They used a range of illuminants ex- 
tending from minus-blue to 14,000° K. For 
a comparison they used the AO (American 
Optical) Color Perception Test. Their re- 
sults bore out their original hypothesis that 
each plate of the I-S test would be stable over 
a wide range of illuminants; that each plate 
would have significant discrimination power; 
and that the relative difficulty of the plates 
would be well distributed. On the other hand, 
the AO test failed to meet these requirements. 

To further clarify the possibilities of this 
novel addition to tests of the pseudo-isochro- 
matic type, the present investigation was un- 
dertaken. A comparison was made of the 
commercial edition of the Illuminant-Stable 
Color Vision Test, containing 12 plates, and 
the AO Color Perception Test, containing 18 
plates as selected by Hardy, Rand, and Rit- 
tler (5). 

Apparatus 

A projection system which permitted the light to 
be thrown directly onto the test charts was devised. 
The light source consisted of a calibrated 500-watt 
projection lamp. 

The frontal lens of the apparatus was placed 26 
inches from the test plates so that the angle of inci- 
dence of the light was 45°. The test plates were 
placed on a stand on a table with a black back- 
ground and tilted slightly to eliminate disturbing re- 
flections from the light source. The subject viewed 
the plates from a distance of 30 inches with the line 
of sight normal or 90°. 


George L. De Nittis 


In order to evaluate the two tests, IlJuminants A, 
B, and C (as defined by the I.C.I.), at two levels 
each (high, H, and low, L) were used. To obtain 
Illuminant A a color temperature of 2,848° K was 
required. By interpolation of the calibration curve 
of the light source it was determined that 89.5 volts 
were needed to attain this color temperature. This 
voltage was controlled by a variac and was kept 
constant throughout the course of the experiment. 
Illuminants B and C were obtained by the use of 
six liquid filters made of optical glass cells 50 by 50 
by 10 mm. The formulae of the chemical solutions 
for use in these filters were obtained from the Hand- 
book of Colorimetry edited by A. C. Hardy (2). 

A Macbeth Illuminometer was used to determine 
the amount of light reflected from the test plates 
under each of the conditions of illumination. The 
range of intensity levels was from 27.03 foot-candles 
for Illuminant AH to 1.88 foot-candles for Illumi- 
nant CL. The level of illumination was sufficiently 
high under each condition to permit color discrimi- 
nation. A .70 neutral density filter was used with 
Illuminants AH and AL to reduce the amount of 
light to a level more comparable with the intensity 
levels of the other illuminants. A .30 neutral den- 
sity filter was used to obtain the lower level of 
illumination in each case. Table 1 summarizes the 
data regarding illumination. 


Procedure 


Fifty male subjects were used of whom ten were 
classified color blind on the basis of past history, 
while 40 were considered to have normal color vi- 
sion. Since the purpose of this investigation was not 
any exhaustive study of color blindness it was not 
felt necessary to have an independent criterion of 
type of color deficiency. Had this been intended 
the experimenter realizes that a greater number of 
color-defective subjects would have had to be in- 
cluded in the investigation. The color-normal men 
ranged from 18 to 46 years in age, with an average 
age of 22.9 years. The color deficient ranged from 


Table 1 


Types of Illumination, with Color Temperature, Amount of Light (Expressed in Foot-Candles), 


and Filter System for Each Type 








Level of 
Illumination 


Illuminant 


Temp. (K) 


A 2,848° 
4,800° 


Cc 6,500° 





* Neutral density filter. 
** See A. C. Hardy (p. 16). 


Foot- 
candles 
27.03 
4.47 


Filter System 


*NDF .70 
NDF .70 & .30 





14.73 
2.43 


**Sol. B, & Bz 
Sol. B:, Bz & NDF .30 


12.50 
1.88 


**Sol. Cc; & Ce 
Sol. C1, Ce & NDF .30 





Relative Effectiveness of Two Standard Color-Vision Tests 


Table 2 


Average Per Cent Correct Responses of the Normal (V = 40) and Color-Blind (V = 10) Groups for 
Two Color Vision Tests Under Six Conditions of Illumination 


AO Test 


Groups AH 


Normal 
Color blind 


99.30 
27.19 


BH ‘ CH 


98.46 
24.39 


98.46 
23.29 


I-S Test 


Groups 


Normal 
Color blind 


17 to 27 years in age, with an average age of 19.3 
years. The range of the total sample was 17 to 46 
years, with an average age of 22.1 years. 

Each subject was tested individually in a dark 
room. Visual acuity was tested with a Betts Tele- 
binocular, and all subjects with corrected vision 
were required to wear glasses during the experiment. 
Immediately after this preliminary measure, the sub- 
ject was placed in a normal sitting position, given 
instructions and tested. The average time per ses- 
sion was 40 minutes with a short rest period if de- 
sired. 

The AO Color Perception Test (18 plates) and the 
Illuminant-Stable Color Vision Test (12 plates) were 
alternated in presentation so that one-half of the 
subjects started with the AO test and the other half 
with the I-S test. Each test was presented six times, 
once under each of the conditions of illumination 
The individual plates of each test were presented in 
different random order for each condition of illumi- 
nation. To compensate for any fixed sequence ef- 
fect, the conditions of illumination were alternated 
so that only every sixth subject was tested with the 
same sequence. 

The examiner gave no cue as to whether a re- 
sponse was correct or not. When a subject gave 
more than one response, he was forced to state which 
one seemed to be more clear to him. This “best 
choice” response is the one which is scored and 
tabulated. A subject’s response had to be complete 
in order to be counted correct. An answer which 
omitted part of the digit or digits appearing on any 
of the plates was considered an error. 


Results 


Since the number of plates contained in the 
AO and I-S tests are unequal (18 and 12, 
respectively), the obtained scores were ex- 
pressed in percentages. In general, the re- 
sponses to each of the plates remained rela- 
tively constant under all conditions of illumi- 


BH . CH 


76.84 
14.97 


75.80 
18.29 


74.97 
19.95 


nation. The only exceptions occurred with 
the color-blind group. Plates No. 7, 11, and 
17, of the AO test showed fewer correct re- 
sponses as color temperature increased, while 
plate No. 16 became easier. The most diffi- 
cult plates were 5, 10, 14, and 15. Each 
plate of this test distinguished clearly be- 
tween the normal and color-blind group even 
in those cases where the plates were relatively 
easy. 

The responses to each of the plates of the 
I-S test were fairly constant for each condi- 
tion of illumination except for Plate No. 7, 
which the color-blind group found increas- 
ingly easy to respond to as color temperature 
increased. Plates No. 8, 9, and 10 were most 
difficult for both groups. The highest per 
cent correct for the normal group was 57.5 
for Plate No. 10 under Illuminant AH. None 
of the color-blind group responded correctly 
to any of these three plates under any of the 
conditions of illumination. The difficulty ex- 
perienced by all subjects with these plates in- 
dicates that some factor other than ability to 
discriminate color enters into the determina- 
tion of a response. Such a factor could be 
the strong configurational design of the plates, 
with the resulting tendency for the digits to 
“disappear” into the background. While this 
difficulty is true for the test as a whole, it 
is accentuated with these three plates. How- 
ever, each plate of this test clearly distin- 
guished between the normal subjects and the 
color defectives. 

Table 2 presents the average per cent cor- 





440 


rect responses for each test as a whole. The 
differences between the normal and color- 
blind groups within each of the tests and 
under all conditions of illumination are highly 
significant. 

It is clearly evident that the I-S test is the 
more difficult of the two for both groups. 
The average per cent correct responses for the 
normal group on the AO test ranges from 
97.62 under Illuminant CL to 99.30 under 
Illuminant AH, whereas the range on the 
I-S test is from 74.97 under Illuminant CL 
to 77.89 under Illuminant AH. For the 
color-blind group the per cent correct on the 
AO test ranges from 22.73 under Illuminant 
CL to 28.28 under Illuminant AL, while on 
the I-S test, the range is from 12.48 under 
Illuminant AH to 19.95 under Illuminant CL. 
Each group scored lower on the I-S test than 
on the AO. However, both tests distinguished 
clearly between normal and color blind. 

The normal group responses for both tests 
reveal a tendency for the errors to remain the 
same as color temperature increases. The 
color-blind group seems to be more sensitive 
to changes in illumination. With the AO 
test the errors tended to increase as color 
temperature increased, and with the I-S test 
they decreased as color temperature increased. 
Both of these tests, then, yielded the same 
results with the normal group, whereas with 
the color-blind group, the AO test seems to 
be easier under Illuminant ‘A than it does un- 
der Illuminant C, while the I-S test seems to 
be easier under I]luminant C than it does un- 
der Illuminant A. 


George L. De Nittis 


Table 3 presents the range of scores ob- 
tained by each group for both tests under all 
conditions of illumination. The lowest score, 
in terms of per cent correct responses, ob- 
tained by the normal group with the AO test 
was 77.7, and the highest score 100. The 
color-blind group scores ranged from 5.5 to 
72.2. There was no overlap between the two 
groups with this test. 

With the I-S test, the scores of the nor- 
mal group ranged from 16.6 per cent to 
100 per cent. The color-blind group scores 
ranged from zero per cent to 58.3 per cent. 
The overlap of scores between the two groups 
with the I-S test is the result of the perform- 
ance of two normal subjects who did rather 
poorly on this test, and one color-blind sub- 
ject who did rather well. This raised the 
upper end of the range of scores for the color- 
blind group and lowered the bottom of the 
range of the normal group. 

The scores obtained on the I-S test are 
more variable and are lower than those on the 
AO test. They are doubtless lower in this 
experiment in part because of the more rigid 
scoring system used than that set up by the 
author of the test, and also because the test 
is actually more difficult, due to the nature of 
the plate designs. The greater variability of 
the scores might be explained by the powerful 
configurational forces which compete against 
the organizational force of the colors, which 
often confused the subject when first pre- 
sented to him and resulted in a lower initial 
score. 

An analysis of variance of the differences in 


Table 3 
Range of Scores (Per Cent Correct Responses) of the Normal (NV = 40) and Color-Blind (NV = 10) Groups 
for Two Color-Vision Tests Under Six Conditions of Illumination 











AO Test 
BH 
77. 


7 
5. 


Groups f AL 
88.8-100 
5.5-61.1 


Normal 
Color blind 


Groups f AL 


41.6-100 
0.0-33.3 0.0-58.3 


Normal 
Color blind 


I-S Test 


CH CL 
83.3-100  77.7-100 
5.5-72.2  5.5-61.1 


7-100 7-100 
5-66.6 .5-61.1 


BH BL CH CL 





41.6-100  16.6-100 
0.0-50.0 


41.6-100 166-100 33.3100 
0.0-50.0 0.0-41.6 8.3-41.6 








Relative Effectiveness of Two Standard Color-Vision Tests 


scores obtained with both tests under all the 
conditions of illumination by the'i0 color- 
blind subjects was undertaken. Second- and 
third-order interaction terms were used as the 
error term to obtain the F ratios. 

The interaction of subjects and tests, with 
an F ratio of 4.90, was found to be significant 
at the .001 level. The interactions of subjects 
and illumination, and tests and illumination, 
with F ratios of 2.73 and 5.51, respectively, 
were significant at the .01 level. Upon fac- 
toring out it was found that this significance 
was due to subjects and tests rather than 
illumination. 

Subject variability was significant at the 
.001 level. The color-vision tests were also 
significantly different at the .001 level. The 
previous observation from obtained scores 
that the I-S test is more difficult was thus 
confirmed. 

What seems to be of greater importance, 
however, is the result, as shown by this 
analysis, that neither the types nor the levels 
of illumination used in this experiment had 
any significant influence on the scores ob- 
tained. The differences in score as color 
temperature changes are well within limits 
due to chance. That differences in type of 
illumination have no effect on scores obtained 
with color-vision tests of the polychromatic 
type, as-used in this experiment, is contrary 
to the findings of other investigators in the 
field, and particularly at variance with the 
conclusions arrived at by Freeman and Zac- 
caria (1). These authors found that the I-S 
test was more stable under different types of 
illumination than the AO test. However, the 
results obtained in this investigation show 
that the AO test is no less stable than the I-S 
test within the range of color temperatures 
used. 

Conclusions 


The principal conclusions arrived at are: 


1. An analysis of responses to the indi- 
vidual plates of each test under all condi- 
tions of illumination reveals that the deter- 
mination of a response to plates No. 8, 9, and 


441 


10 of the I-S test is influenced by some other 
factor than ability to discriminate color. The 
strong configurational design of the plates, 
intended to “catch” the borderline color de- 
fective is a possible factor. 

2. With the scoring system equated, both 
tests classified an individual as normal or 
color blind, under all conditions of illumina- 
tion. There was, however, the question of 
overlap of scores with the I-S test already 
mentioned. 

3. The I-S test was much more difficult 
than the AO test under all conditions of 
illumination. The analysis of variance shows 
this difference to be significant at the .001 
level of confidence. 

4. The intensity of illumination was not a 
factor in determining test scores. 

5. The types of illumination used in this 
experiment had no significant effect on the 
scores obtained with the AO and I-S tests. 
This is the more significant conclusion reached 
because it is contrary to the results of other 
investigators in this field. Any definitive ex- 
planation for this discrepancy must lie in a 
study of the methods used to arrive at de- 
sired color temperatures. 

6. Within the range of the color tempera- 
tures used, the AO test is as stable as the 
I-S test. 


Received March 23, 1955. 


References 


Illuminant-Stable 
J. Opt. Soc. Amer., 


. Freeman, E., & Zaccaria, M. A. 
Color Vision Test II. 
1948, 38, 971-976. 

. Hardy, A. C. (Ed.) Handbook of colorimetry. 
Cambridge, Mass.: M.I.T. Technology Press, 
1936. 

3. Hardy, L. H., Rand, G., & Rittler, M. C. Tests 
for the detection and analysis of color blind- 
ness: the Ishihara test. J. Opt. Soc. Amer., 
1945, 35, 268-275. 

. Hardy, L. H., Rand, G., & Rittler, M. C. The 
effect of quality of illumination on the results 
of the Ishihara test. J. Opt. Soc. Amer., 
1946, 36, 86-94. 

. Hardy, L. H., Rand, G., & Rittle, M. C. A 
screening test for defective red-green vision. 
J. Opt. Soc. Amer., 1946, 36, 610-614 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


An Attempt at Validation of the Empathy Test 


Graham B. Bell 


Louisiana State University 


and Rhoda Stolper ' 


U.S. Navy Medical Research Laboratory, New London 


The authors of the Empathy Test (TET) 
imply that it measures “the ability to put 
yourself in the other’s position, establish rap- 
port and anticipate his reactions, feelings and 
behaviors” (6). The test itself requires the 
individual taking it to estimate the relative 
popularity of different kinds of music among 
office or factory workers, the national popu- 
larity of specific magazines, and the personal 
habits of others most annoying to persons 
aged 25 to 39 or persons over 40. 

Validation of the test has been accom- 
plished mainly by demonstrating that those 
scoring high on TET are more effective in 
interpersonal relations, e.g., high scores have 
been related to success in predicting another’s 
behavior, to effectiveness of supervisors, and 
effectiveness of union business agents (6, 7). 

Although the general pattern of results sup- 
port the contention that TET measures some- 
thing related to success in interpersonal rela- 
tions, there have been exceptions, e.g., TET 
scores have been related to success in selling 
used cars but not to success in selling new 
cars or counter selling (7). High TET scores 
have been related to success in leadership in 
some experimental groups (5) but not others 
(8). 

The apparent inconsistency of results sug- 
gests that either the TET does not measure 
empathic ability as defined by the authors, 
or empathic ability as measured by the test 
is not important in certain situations. In 
evaluating either of these possibilities it 
would be extremely useful to know the exact 
kinds of behavior to which TET scores are 
related, ie., to determine those things to 
which the high scorer is sensitive. As we 
delineate the specific abilities associated with 
high TET scores, we will be able to clarify 
the concept of empathy, use it more effec- 


‘Formerly at Louisiana State University. 


tively as an intervening variable in explain- 
ing human behavior, and possibly explain the 
apparent inconsistencies noted above. 

Previous. studies have demonstrated that 
scores on TET are not related to individual 
empathy as measured by Dymond-like tests 
of individual empathy (4, 5, 8). Since the 
format of TET requires the subject to pre- 
dict general attitudes or mass attitudes, it 
would seem logical to expect that TET might 
measure skills related to estimating group 
opinion. One such test of estimating group 
opinion is the Sensitivity to Other Persons 
Test (STOP). The STOP test requires each 
subject to estimate the average or group 
opinion of each member in reference to a 
series of personality traits (4). 

It would seem logical to propose that suc- 
cess on STOP test might be related to suc- 
cess on TET. 


Procedure 
TET Scores 


TET was administered under standard conditions 
and scored in accordance with the instructions in the 
manual (6, 7). 


STOP Scores 

Seventy-two college students were organized into 
12 six-man leaderless groups. The group task and 
procedure were as in Bell and French (1). After 
the completion of 30 minutes’ interaction the data 
necessary to determine STOP scores was collected 
by asking each subject to: (a) rate himself and each 
other subject on six personality traits using an eight- 
point scale; (b) estimate the average rating received 
by each other member on the basis of averaging (a) 
above. 

The six traits upon which the subjects (Ss) rated 
each other and estimated group opinion were those 
selected by Fiske’s (3) factor analysis of ratings: 
social adaptability, emotional control, conformity, 
inquiring intellect, confidence in self expression, and 
predictability. The experimenter explained the use 
of trait definitions which were passed out to each 
subject. 

Each subject’s prediction of each other subject’s 


442 





Validation of the Empathy Test 


average rating was compared to the actual average 
rating on each of the six traits. The deviations of 
prediction from the actual average were arithmeti- 
cally summed over all for traits and subjects. 


Results and Discussion 


The product-moment correlation between the 
STOP scores and TET scores was not signifi- 
cantly different from zero, — 17. Therefore 
in terms of statistical probability it may be 
concluded that the empathic ability of sub- 
jects as measured by TET is not positively 
related to these subjects’ ability to predict 
the stimulus value of group members to each 
other as measured by the STOP technique. 

Since TET scores are not related to indi- 
vidual empathy (4, 5, 8) nor STOP scores, 
yet are related to success in some positions 
where empathic ability might be an important 
factor in success (7), the question naturally 
arises as to what does TET measure. 

In view of the fact that many of the be- 
havior indices to which scores on TET are 
related involve leadership, e.g., union busi- 
ness agents, supervisors, it might be appro- 
priate to consider some of the specific kinds 
of information which leaders seem to possess. 
Chowdry and Newcomb’s (2) finding that the 
leader is better informed than the nonleader 
on matters pertinent to the group, suggests a 
possible area of sensitivity that TET scores 
may tap. Although on the surface it may 
seem that STOP scores should be related to 
the ability to predict the type of group opin- 
ion measured by Chowdry and Newcomb, 
Taft (9) has shown that STOP-like scores 
are not related to this ability. Thus a test 
of the above relationship between TET and 
leaders’ information might be very fruitful in 
determining what TET measures. 


Conclusions 


This attempt at validating The Empathy 
Test was not successful. Apparently subjects 
who score highly on it do not estimate the 
group’s interpersonal estimate of each other 
any more effectively than low scorers. Sug- 


gestion has been made of other areas that 
may prove useful in investigating the actual 
kinds of behavior measured by TET. 


Received January 10, 1955. 


References 


. Bell, G. B., & French, R. L. Consistency of in- 
dividual leadership position in small groups 
of varying membership. J. abnorm. soc. Psy- 
chol., 1950, 45, 764-767. 

. Chowdry, K., & Newcomb, R. M. The related 
ability of leaders and non-leaders to estimate 
opinion of their own group. J. abnorm. soc 
Psychol., 1952, 47, 51-57. 

. Fiske, D. W. Consistency of the factorial stric- 
tures of personality ratings from different 
sources. J. abnorm. soc. Psychol., 1949, 44, 
329-344. 

. Gilbert, O. E. The relationship of schizophrenia 
and paranoia to empathy. Unpublished mas 
ter’s thesis, Louisiana State Univer., 1953. 

. Hall, H. E., Jr. Empathy, leadership, and art. 
Unpublished master’s thesis, Louisiana State 
Univer., 1953. 

. Kerr, W. A., & Speroff, B. J. 
of empathy. 
1951. 

. Kerr, W. A., & Speroff, B. J 
supplements to manual. 
metric Affiliates, 1951. 

8. Stolper, Rhoda. An 
potheses 


The measurement 
Chicago: Psychometric Affiliates, 


The empathy test, 
Chicago: Psycho- 


investigation of some hy- 
concerning empathy Unpublished 
master’s thesis, Louisiana State Univer., 1954 
. Taft, R. Some correlates of the ability to make 
accurate social judgments. Unpublished doc- 
tor’s dissertation, Univer. of California, 1950 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Prolonged Reading Tasks in Visual Research ' 


Miles A. Tinker 


University of Minnesota 


In conducting research on visual functions 
such as the effect of typographical variations 
or illumination upon speed of perception in 
reading, it is desirable at times to employ a 
relatively long reading period. The present 
experiment was designed to provide informa- 
tion on the results of using relatively long 
reading tasks in certain kinds of visual re- 
search. The main parts of the experiment 
are concerned with the effects of italics and 
all-capitals on speed of perception in reading. 


Materials and Procedure 


The first task was to devise a reading test that 
would measure speed of perception in reading un- 
complicated by comprehension, i.e., with compre- 
hension constant. Two approximately equivalent 
forms, each form containing 450 items of 30 words 
each were constructed (4). The items were so ar- 
ranged that one word in the latter part of the item 
spoils the meaning. The reader is to find the wrong 
word as quickly as possible and draw a line through 
it. This word cannot be identified without reading 
the item. Accuracy in responding to the test is 99.9 
per cent for Form I and 99.6 per cent for Form II, 
ie., speed of reading is measured as a single vari- 
able. The median reliability coefficient, Form I vs. 
Form II, is .865. Any time limit up to 30 minutes 
of reading may be used on each form. The two 
forms are approximately equivalent. Standardiza- 
tion revealed that Form II is read 0.3 to 2.0 per 
cent slower than Form I. 

In the first experiment Form I was set in regular 
(roman) 10-point Excelsior type face with 2-point 
leading in a 20-pica line width on eggshell paper 
stock. Form II was the same typographically ex- 
cept that italic rather than roman type was used. 
Another copy of Form II was also printed exactly 
the same as Form I. The 192 university sophomore 
subjects were tested in classroom groups of about 32 
each with standard directions. The tests were as- 
sembled as follows: In Group A, Form I and Form 
II were typographically identical, ie., both in roman 
type. This is the control group. In Group B, Form 
I was roman type; Form II, italic. Tests for Group 
A and Group B were alternated from subject to sub- 
ject. The practice exercise was followed by Form I 
and then Form II. The time limit was 30 minutes 

1The writer is grateful to the University of Min- 
nesota Graduate School for a research grant to 
finance this study. 


for each form, with a mark on the test to indicate 
each 10 minutes of testing. 

In the second experiment the typography of test 
Forms I and II were identical and printed in roman 
lower-case type for the control group, C, (as in the 
above experiment, Group A). In Group D, Form I 
was in lower-case type as in the control group and 
Form II was typographically the same as Form I 
except that it was printed in all-capitals. There 
were 127 subjects (Ss) (university sophomores) in 
each test group, 254 in all. They were tested with 
standard directions in small classroom groups of 
about 32. A time limit of 16 minutes on each test 
form was used with a mark made after each 4 
minutes. Papers for the two test groups were alter- 
nated with successive Ss. 


Results and Discussion 


The results for the first experiment, italic 
vs. roman print, are given in Table 1. In- 
spection of this table reveals that italic print 
is read significantly slower than roman when 
the reading period is 10 minutes or more. 
The retarding effect of italics ranged from 
4.2 to 6.3 per cent. In an earlier study with 
a similar technique of measurement, Tinker 
and Paterson (5) found a retarding effect of 
only 2.7 per cent, which was not significant, 
when they used a time limit of 1°54 minutes. 
University sophomores were subjects in both 
experiments. The same type of test and the 
same procedures were employed in the two 
studies. 

Data for all-capital versus roman lower- 
case print are given in Table 2. In every 
comparison, each 4-minute and the 16-minute 
period, the all-capital text was read signifi- 
cantly slower than the roman. The percent- 
age of retarding effect ranged from 10.2 to 
14.2. These results are comparable to those 
of Tinker and Paterson (5) who obtained a 
difference of 13.4 per cent when using a time 
limit of 124 minutes. The retarding effect 
of all-capital print is relatively large. In 
fact, few typographical variations in printing 
practice produce differences as large as this. 

The experimental results presented here 





Prolonged Reading Tasks in Visual Research 


Table 1 
The Effect of Italics on Speed of Reading 
(N = 96 in each test group, 192 in all) 














Diff. Between 
Means in 
Per 
Cent 


Para- 
graphs* 


Test Form 
and Face 


Time 
Limit SD 
‘17.6 

19.7 


Mean P.E. Diff. 
108.5 


107.0 











+ Y Roman 
II, Roman 


First 


10 min. 6a 


0.0 0.00 
Second 


10 min. 


106.8 
106.5 


17.5 
17.8 


I, Roman 


II, Roman aad 


0.00 


Third 
10 min. 


106.5 
102.1 


I, Roman 
II, Roman 


19.5 


21.5 sa 


0.00 


Total: 
30 min. 


321.9 
315.5 


52.1 
57.3 


20.3 
24.6 


I, Roman 


II, Roman oe 


0.00 


First 
10 min. 


106.9 
100.9 


I, Roman 


II, Italic $2 


5.04 


Second 
10 min. 


105.2 
99.4 


19.5 
23.2 


22.1 
26.0 


I, Roman 
II, Italic 


—5.2 


Third 
10 min. 


105.4 
94.2 


I, Roman 


II, Italic — 


—6.3 84 


Total: 
30 min. 


316.3 
294.5 


I, Roman 
II, Italic 


61.4 


72.4 iA 


—4.9 89 6.60 





* The differences in Column 6 are ‘‘corrected” by the amount of the differences between the mean scores of Form I and Form II 
in Test Group A which serves as a control group. The ‘‘corrections’’ amount to +1.53 for the first 10 minutes, +0.30 for the 
second 10 minutes, + 4.45 for the third 10 minutes, and +6.40 for the 30 minutes. Original computations were carried to 4 deci- 
mal places. 


provide specific information concerning the 
measurement of speed of perception in read- 
ing in typographical studies which involve a 
relatively long-enduring visual task. The 
measuring instrument (4) described and em- 
ployed here may be used to advantage in 
illumination as well as in _ typographical 
studies. It has been employed successfully 
in a series of studies (1, 2, 3). In situations 
of this kind, speed of perception in reading 
as measured in this test becomes a relatively 
sensitive technique for discriminating non- 
optimal from optimal conditions. 


Summary and Conclusions 


1. The purpose of these experiments is to 
demonstrate the usefulness of prolonged pe- 
riods of reading in studying the effects on 
speed of perception in reading varying typo- 
graphical arrangements. 


2. Reading periods of 10 minutes or more 
produced a significant retardation in reading 
italic in comparison with roman print. With 
a reading period of 184 minutes in an earlier 
experiment the retardation was not signifi- 
cant. 

3. Retardation in speed of reading all- 
capital material in comparison with roman 
print was large and approximately the same 
irrespective of the length of the reading pe- 
riod within the limits of 4 to 16 minutes. 
Approximately the same retardation was 
found with a time limit of 134 minutes in 
an earlier experiment. 

4. Measuring speed of perception in read- 
ing is a relatively sensitive technique for use 
in typographical studies when prolonged pe- 
riods of reading are employed. 


Received January 10, 1955. 





Miles A. Tinker 


Table 2 


The Effect of All-Capital Printing on Speed of Reading 
(N = 127 in each test group, 254 in all) 


Diff. Between 

Means in 
Test Time Test Form and Para- Per 
Group Limit Type Form Mean S graphs* Cent 





+ First I, Lower C. 50.0 


. “ ~ 0.0 0.0 
4 min. II, Lower C. 48.5 : 


Second I, Lower C. 49.8 


A . 0.0 0.0 
4+ min. II, Lower C. 48.4 


Third I, Lower C. 49.4 


‘ ; 0.0 0.0 
4 min. II, Lower C. 49.1 


Fourth I, Lower C. 48.6 


. o 0 
4 min. II, Lower C. 47.3 . 


Total: I, Lower C. 197.8 


a 0.0 
16 min. II, Lower C. 193.3 


First I, Lower C. 50.3 


4 min. II, All-Cap. 43.0 ats 


Second I, Lower C. 49.8 
4 min. II, All-Cap. 41.4 


—14.2 


Third I, Lower C. 50.2 


4 min II, All-Cap. 43.1 vee S| = 


Fourth I, Lower C. 48.5 9.2 


~10.2 8 5. 
4 min. II, All-Cap. 42.2 8.3 ; , aan 


D Total: I, Lower C. 198.8 36.1 
16 min. II, All-Cap. 169.6 30.7 


—12.4 .90 26.27 


* The differences in Column 6 are ‘‘corrected”’ by the amount of the differences between the mean scores of Form I and Form II 
in Test Group C which serves as a control group. The “‘corrections’’ amount to +1.49 for the first 4 minutes, +1.35 for the 
second 4 minutes, +0.35 for the third 4 minutes, +1.32 for the fourth 4 minutes, and +4.55 for the 16 minutes. Original com- 
putations were carried to 4 decimal places. 


References 3. Tinker, M. A. Effect of slanted text upon the 
readability of print. J. educ. Psychol., 1954, 
1. Tinker, M. A. Cumulative effect of marginal “ = oo a a le 
conditions upon rate of perception in reading. a Stine oli yee ape  Resricagy Pr 
>a dS: ) rer. a - ess, 
J. appl. Psychol. 1948, 32, 537-540. a ; 
2. Tinker, M. A. The effect of intensity of illumi- 5. Tinker, M. A., & Paterson, D. G. Influence of 
nation upon speed of reading six-point type. type form on speed of reading. J. appl. Psy- 
Amer. J. Psychol., 1952, 65, 600-602. chol., 1928, 12, 359-368. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


The Influence of Color of Paper upon Scores Earned on Ob- 
jective Achievement Examination ' 


William B. Michael and Robert A. Jones 


University of Southern California 


At the University of Southern California it 
has been customary to employ different colors 
of paper upon which to mimeograph the items 
of objective achievement examinations. Cus- 
tomarily, in a midterm or final examination, 
between two and ten forms are used, depend- 
ing upon the size of the class and the num- 
ber of different periods for which a given 
course may be scheduled. As a means of 
permitting instructors and proctors to differ- 
entiate among the various forms, the pres- 
ence of different colors of paper is a decided 
aid. 

Often only one limited pool of items is 
available for an examination. Through de- 
vising by random means two or more arrange- 
ments of items, an instructor is able to pre- 
pare two or more forms of an examination 
and to have each form printed on paper of 
a different color. Such a procedure tends to 
assure a relatively high level of security in 
the behavior of examinees, since a given num- 
ber of forms readily identified by the color of 
paper can be conveniently alternated by rows 
of seats. 

Surprising as it may seem, several profes- 
sors have reported that they have observed 
differential behavior on the part of the ex- 
aminees who, though responding to the same 
items, were exposed to different colors of pa- 
per. For a given group of test items, not only 
differences in the average number of correct 
responses were supposedly found relative to 
the color of paper employed, but also dif- 
ferences in the amounts of average time that 
elapsed in the completion of the test. Since 
there were frequently two orders of item pres- 
entation (usually referred to as two “scram- 
bled” forms) in the examinations, it was not 
known whether the apparent differences in 


‘The contents of the article are taken from a 
paper presented by the senior author at the annual 
meetings of the American Association for the Ad- 


vancement of Science, Section Q, December 
1954, at Berkeley, California. 


27-30, 


447 


mean scores of groups of examinees might be 
due to the presence of different colors, to the 
existence of two or more arrangements of 
items, or to the appearance of chance, or ran- 
dom error, factors. 

Problem. Throughout the history of ap- 
plied psychology careful experimentation has 
not supported many “common-sense” views 
often held. It appeared that the opinions of 
both professors and students concerning the 
influence of color of paper upon examination 
scores was another illustration of a popular 
notion that should be verified in light of data 
furnished from an appropriate experimental 
design. Therefore, it was the purpose of the 
writers to ascertain the extent, if any, of dif- 
ferences in the average scores of groups of 
college students on both multiple-choice and 
true-false items that appear in two random 
orders of presentation when they have been 
mimeographed on sheets of paper of differ- 
ent colors. 

Subjects. The scores of 300 male and 150 
female students on a final examination in 
elementary psychology (introductory course) 
were studied. In addition, the scores of 150 
men in one course in business administration 
(Law of Contracts, Sales and Negotiable In- 
struments) and of 88 men in a different 
course in business administration (Law of 
Business Organizations) were obtained for 
evaluation. Although registration in the two 
courses in business administration is open to 
both men and women, the scores of the 
women. were not analyzed, since the number 
of women that could be assigned to each ex- 
perimental condition varied between two and 
five. The age of subjects fell within the 
range between 18 and 26 years with virtually 
90 per cent of them being between 19 and 23. 

Design of study. In the class in general 
psychology and in the larger one in business 
administration the employment of five colors 
of paper and two orders of presentation of 





William B. Michael and Robert A. Jones 


Table 1 


Analyses of Variance of Final Examination Scores of 300 Male Subjects in Elementary Psychology 











Number of Correct Multiple-Choice Items (R-W/3) 
Source of > 
Variation df SS MS F p 


561.71 140.43 1.384 >.05 
27.60 27.60 272 >.05 


Between colors 
Between orders 


4 
1 

Interaction CXO 4 176.11 44.03 434 >.05 
9 


Subtotal 765.42 
Within groups 290 29,421.30 
Total 299 30,186.72 


Number of Correct True-False Items (R-W) 





Between colors 4 560.70 140.18 592 >.05 

Between orders 1 19.26 19.26 081 >.05 

Interaction CXO 4 767.71 191.93 81 >.05 
Subtotal 9 1,347.67 

Within groups 290 68,612.00 


Total 299 69,959.67 





test items within each block of multiple- blocks of items there was no logical sequence 
choice and true-false items constituted the 10 in their content, since a random selection of 
experimental conditions, or a 5 X 2 experi- items constituted the basis for the develop- 
mental design. In the two arrangements of ment of the forms. 


Table 2 


Analyses of Variance of Final Examination Scores of 150 Female Subjects in Elementary Psychology 








Number of Correct Multiple-Choice Items (R-W/3) 
Source of Wo 
Variation j SS MS F 
Between colors 229.69 57.42 805 
Between orders 102.50 102.50 1.437 
Interaction CXO 235.69 58.92 826 





Subtotal 567.88 
Within group 9,984.00 
Total ‘ 10,551.88 


Number of Correct True-False Items (R-W) 


Between colors 529.80 133.45 .652 
Between orders 331.53 331.53 1.619 
Interaction CXO 105.11 26.28 .128 





Subtotal 966.44 
Within groups _ 28,676.40 
Total 29,642.83 





Color of Paper and Objective Examination Scores 


For the third group of subjects only two 
colors, white and goldenrod (yellow-orange), 
were employed in conjunction with two or- 
ders of item presentation. In view of the 
small size of the class it was decided that it 
would be better to have larger numbers of 
subjects for each of the four experimental 
conditions rather than a smaller number under 
each of 10 conditions. The goldenrod color 
was selected in view of numerous complaints 
that both faculty members and students have 
registered as to its unpleasant esthetic quali- 
ties. 

The numbers of subjects for each experi- 
mental condition were as follows: 30 men 
and 15 women in the elementary psychology 
course, 15 men in the larger business adminis- 
tration class, and 22 men (for only four con- 
ditions) in the smaller business administra- 
tion group. In terms of limitations in seating 
arrangements it was necessary occasionally to 
eliminate randomly one or two excess men 
from an experimental condition. Three male 
students reporting a color-blindness handi- 
cap were not included in the samples. 

In order that possible differences in the 
amounts of illumination present within each 


449 


auditorium, as well as between the three audi- 
toriums utilized, might be controlled, stu- 
dents were assigned to seats in terms of a 
latin-square type of design. 

The numbers of multiple-choice and true- 
false items in the three sets of examinations 
in general psychology, in the larger business 
administration class, and in the smaller busi- 
ness administration class were 60 and 90, 35 
and 115, and 34 and 106. Standard a priori 
scoring formulas embodying corrections for 
so-called chance successes were used for each 


type of item. 
a 


Analysis of the results. For neither men 
nor women in the elementary psychology class 
were any statistically significant sources of 
variance in the two sets of scores found to be 
associated with the order of item presenta- 
tion, with the color of examination paper, or 
with the interaction of the two variables. 
The results of the analyses of variance of 
the scores for the 300 men and 150 women 
are summarized in Table 1 and Table 2, re- 
spectively. Both in terms of the analyses of 
variance of the scores for the two sexes and 
from an inspection of the means (which are 
not presented) corresponding to each of the 


Table 3 
Analyses of Variance of Final Examination Scores of 150 Male Subjects in Business Administration 
(Law of Contracts) 


Source of 


Variation df Ss 


Between colors 
Between orders 
Interaction CXO 


Subtotal 
Within groups 


Total 


190.62 
218.40 
86.89 


Number of Correct Multiple-Choice Items (R-W/4) 


MS 

47.66 
218.40 

21.72 


495.92 


86,053.79 


Number of Correct True-False Items (R-W) 


Between colors 
Between orders 
Interaction CXO 


Subtotal 9 
Within groups 140 


Total 149 


0.06 
548.33 
716.04 


137.08 542 
0.06 0002 
179.01 707 


1,264.43 
35,435.07 


36,699.50 





William B. Michael and Robert A. Jones 


Table 4 


Analyses of Variance of Final Examination Scores of 88 Male Subjects in Business Administration 


(Law of Business Organizations) 





Source of 


Variation j SS 


352.00 
22.00 
269.50 


Between colors 
Between orders 
Interaction CXO 


Subtotal 
Within groups 
Total 


29,330.45 


Number of Correct Multiple-Choice Items (R-W/4) 


PF 
1.008 
063 


aie 


352.00 
22.00 
269.50 


643.50 


349.17 


29,973.95 


Number of Correct True-False Items (R-W) 


152.91 
18.18 
76.41 


Between colors 
Between orders 
Interaction CXO 


Subtotal 
Within groups 


Total 87 


10 respective experimental conditions no dif- 
ferential effects relative to sex of the sub- 
jects could be inferred. 

In the larger of the two classes in busi- 
ness administration, no statistically signifi- 
cant sources of variance were found. As in 
the instance of the psychology class, the re- 
sults were negative. The analyses of vari- 
ance are described in Table 3. 

Only in the smaller class in business ad- 
ministration was a statistically reliable source 
of variation revealed. In Table 4, an F value 
of 4.188, which is significant beyond the 5 
per cent point (but not at the 1 per cent 
point), was found to exist for the source of 
variance in scores on true-false items corre- 
sponding to their appearance on white or 
goldenrod (yellow-orange) paper. However, 
in the instance of multiple-choice items an 
F value of only 1.008 was forthcoming. Al- 
though it is possible that the format of a test 
item may be instrumental in the realization 
of a significant difference in mean scores as- 
sociated with the use of the two colors of pa- 


152.91 
18.18 
76.41 


4.188 
498 
2.093 


247.50 
3,066.82 
3,314.32 


pers (white and goldenrod), it would seem 
somewhat more plausible to attribute the find- 
ing of an F value of the size of 4.188 to the 
existence of random error in view of the fact 
that among the 36 F’s calculated in the study 
one would expect to find about two to be sig- 
nificant at or beyond the 5 per cent point if 
the null hypotheses were true. 

Conclusions. With the one exception cited 
for the fourth group of 88 examinees studied, 
no statistically significant differences in the 
average scores of college students appeared 
on either multiple-choice or true-false items. 
From the evidence furnished by four college 
groups it may be concluded that the color 
of paper upon which objective achievement- 
test items are mimeographed does not sig- 
nificantly influence the average number of 
correct responses of examinees (corrected for 
so-called chance successes). No significant 
differential behavior on the part of either 
men or women college subjects to color was 
demonstrated. 


Received January 5, 1955. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Rater Reliability and “Judgmental Fatigue” 


A. W. Bendig 


University of Pittsburgh 


In discussing psychometric scaling meth- 
ods, Woodworth (14, p. 378) suggested that 
the continued elicitation of judgments from 
a subject (S) resulted in less adequate judg- 
ments due to “esthetic fatigue.” Guilford 
(8, p. 235) listed this factor as the chief ob- 
jection to the method of paired comparisons, 
while Hamlin (9) concludes that clinical judg- 
ments lose validity when stimuli too com- 
plex and extensive are presented to the S. 
Cummings (6) calls this phenomenon “rater 
demoralization and fatigue’ and suggests that 
it is similar to Woodworth’s concept. West 
(13) has shown retest reliability in ranking 
to be a decreasing rectilinear function of the 
number of esthetic judgments required of the 
S between the test and retest ranking sessions. 
A study of the validity of military peer rat- 
ings (11) showed an initial warm-up effect in 
a rise of validity between the first and second 
groups of five ratings, and a subsequent de- 
_crement in validity for the third and fourth 
groups of five ratings. 

Cumming’s brief discussion of this concept 
(6, p. 246) suggests that two phenomena are 
involved. “‘Judgmental fatigue” is a cumula- 
tive phenomenon that does not affect the first 
judgments of S, but increasingly reduces the 
reliability and validity of judgments as more 
are elicited. Like motor response “fatigue,” 
this loss in judgmental adequacy may be at- 
tributable to boredom, decreasing motivation, 
or to the accumulation of waste products. 
‘‘Judgmental disorganization” may occur when 
S, in surveying the entire judgmental task he 
is to attempt, perceives it as being too ex- 
tensive and complex for adequate handling. 
This discouragement of a judge (a) should 
occur when S is aware of the size and com- 
plexity of the total task before the first judg- 
ments are elicited, (b) should affect the re- 
liability and validity of all of his judgments, 
including the first, and (c) should ‘be a 
monotonic increasing function of the size and 
complexity of the total task. Both “judg- 


451 


mental fatigue” and “judgmental disorgani- 
zation” are assumed to be influenced by the 
judge’s level of motivation and previous ex- 
perience in the judgmental situation. 

The following study was designed to assess 
a possible loss in rater reliability as a tem- 
poral series of self-ratings are required of the 
judge when the total number of judgments 
are constant between Ss. As suggested in 
the previous paragraph it is an investigation 
of “judgmental fatigue’ and not of “judg- 
mental disorganization.” 


Procedure 


Stimuli. A list of the names of 90 common foods 
was selected from those used by Wallen (12) and 
Thurstone (10). Foods were selected from Wallen’s 
list, using the criteria (a) that the food could ra- 
tionally be assumed to be familiar to college Ss, and 
(b) no sex differences were found by Wallen in Ss’ 
preferences for the food. Foods were selected from 
Thurstone, using criterion (a). 

The food names were numbered and divided into 
six nonoverlapping lists by drawing from a table of 
random numbers. Two lists (10A and 10B) con- 
tained 10 foods each, two lists (15A and 15B) in- 
cluded 15 foods, while two lists (20A and 20B) con- 
tained 20 foods. 

Scale. A nine-category food-preference rating scale 
was constructed, using five verbal anchors. Anchors 
A, C, E, G, and I from a previous study (4, Table 
1) were used to define the first, third, fifth, seventh, 
and ninth categories on the scale, with the remaining 
categories left unanchored. This scale was con- 
structed in the light of previous research which had 
indicated that nine-category scales equaled or sur- 
passed shorter scales in reliability when used by un- 
dergraduate Ss (1, 2, 5), that end anchors deviating 
widely in psychological distance from the center 
anchor yielded more reliable ratings (4), and that 
increased category anchoring increased rater reli- 
ability (1). This scale, along with rater instructions 
and the food lists to be rated, were mimeographed 
on single sheets for distribution to Ss. 

Subjects. A total of 120 undergraduate students 
enrolled in psychology classes served as raters and 
were randomly divided into six groups of 20 Ss each. 
Each of the Ss rated 45 foods using the above scale, 
with each S receiving three lists containing 10, 15, 
and 20 foods each. The specific lists used for a 
group and the order in which the lists were pre- 





A. W. Bendig 


Table 1 


Analyses of Variance of Transformed Rater Reliability and Bias Coefficients 








Source of Variation 


Total 
Lists 
Length 
Lists of same length 
Trials . 
Lists X trials 
Length X trials 
Residual 


— 


NwWwnre unm! 


-_ 
-~ Oo 


on 
= 


* Significant at the .001 level. 


sented to the Ss varied from group to group. Group 
A rated lists 10A, 15A, and 20A in that order: Group 
B got lists 10B, 20A, and 15B: Group C rated 15A, 
20B, and 10B: Group D 15B, 10A, and 20B: Group 
E received 20A, 10B, and 15A; while Group F got 
lists 20B, 15B, and 10A. Each of the possible or- 
ders of list length were used with one of the groups 
and each of the six food lists occurred once on each 
of the ordinal trials. Except for the above order 
and trial restrictions, the assignment of specific lists 
to the groups was random. 

The Ss were told that the E was interested in com- 
paring their ratings of the foods with those of other 
Ss who had rated the foods at other universities. 


Results 


The ratings of each of the six groups of Ss 
for the three separate food lists were analyzed 
by two-criterion analysis of variance pro- 
cedures: a total of 18 analyses. The mean 
squares for foods, raters, and error in each 
analysis were utilized to obtain measures of 
rater reliability (average interrater correla- 
tion) and rater bias (individual differences 
between raters in the average rating assigned 
the food stimuli) by the following formulas: 


(Foods MS)—(Error MS) 


(Foods MS)+(V—1) (Error MS)’ 
(Raters MS)— (Error MS) 


(Raters MS)+(k—1) (Error MS)’ 


Reliability = 


Bias= 


where N is the number of raters in the group, 
k is the number of food stimuli on the list, 
and MS is a mean square from the analysis 
of variance. The above formula for rater 
reliability is the same as used in previous 


Rater Reliability 


Rater Bias 


Mean 
Square F 


Mean 
Square 





1.9522 17.4* 
8732 3 
2.6716 
.0896 
1123 
1335 
0981 


0062 
0021 
.OO89 
.0182 
0708 
3223 
.0966 


studies (1, 2, 3, 4, 5) while the rater bias 
formula differs from the one previously used 
in that it corrects for the varying number of 
food stimuli on the lists. These reliability co- 
efficients measure the average ability of single 
raters to discriminate differences between the 
food stimuli on a list, while bias coefficients 
are an index of individual differences between 
raters in the ratings they assign to a single 
food and (considering single food names as 
items on a test) is analogous to a measure of 
interitem homogeneity. 

Eighteen reliability and bias coefficients 
were computed.’ Each of these coefficients 
was transformed to a normally distributed 
variate by the formula given by Fisher (7, p. 
219). Two analyses of variance were com- 
puted from these transformed values: one for 
the reliability measures and the second for 
the bias coefficients. The results of these 
analyses can be found in Table 1. The form 
of this analysis deliberately confounded group 
differences with Lists and Residual mean 
squares to permit the generalization of any 
significant results to the population of rater 
groups from which these six groups were 
drawn. It can be seen in Table 1 that the 


1A table of the reliability and bias coefficients has 
been deposited with .the American Documentation 
Institute. Order Document No. 4623 from the ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for 35 mm. microfilm or 
$1.25 for 6 X 8 in. photocopies. Make checks pay- 
able to Chief, Photoduplication.Service, Library of 
Congress. 





Rater Reliability and “Judgmental Fatigue” 


single food lists differed significantly in rater 
reliability (.001 level of confidence), but that 
reliability was not affected by either the 
length of the food list or by the ordinal po- 
sition of the list in the trials. None of these 
main variables appeared to affect significantly 
the measures of rater bias. 

For comparative purposes the median re- 
liability and bias coefficients for each list 
length and trial are given in Table 2. The 
lists containing 10 food stimuli had a sl.,htly 
higher median reliability than lists of 15 or 
20 foods, but this difference was not statisti- 
cally significant when compared to the dif- 
ferences between lists of the same length. 
Rater bias increased slightly on later trials, 
but again this trend was not significant com- 
pared to the lists by trials interaction mean 
square. 

Discussion 

Under the conditions of this study, rater 
reliability did not appear to be affected by 
the temporal position of the stimuli in the se- 
quence of judgments. If we assume, as did 
West (13), that “judgmental fatigue’ re- 
duces the judge’s ability to discriminate be- 
tween stimuli, then the ratings appeared un- 
affected by this variable. The reliability of 
food preference ratings is greatly influenced 
by the food stimuli comprising the list, as has 
been previously shown (3, 4), and even with 
the random assignment of food stimuli to the 
individual lists, as used here, significant dif- 
ferences between the lists were apparent. 
This meant that the tests of the effect of 
“judgmental fatigue’ upon rater reliability 
tended to lack statistical power. However, 
it can be concluded that if “judgmental fa- 
tigue”’ influenced rater reliability (and we 
have no evidence that it did), its influence is 


Table 2 
Median Reliability and Bias Coefficients for Each 
Food List Length and Trial 


List Length 


Rater 
Variable 10 : 20 
Reliability 30 
Bias 





453 


minor. The obvious interest of the Ss in per- 
forming this rating task may also have re- 
duced the influence of “judgmental fatigue” 
in this study. 

The temporal order of the stimuli did not 
seem significantly to affect rater bias, al- 
though there was a slight tendency for bias 
measures to increase on later trials. It is in- 
teresting to note that the individual lists did 
not vary significantly in rater bias, but did 
in rater reliability. These results seem to 
confirm a previous tentative conclusion (3) 
that rater bias is relatively uninfluenced by 
the sampling of food stimuli on the list. 


Summary 


Subjects (V = 120) were randomly divided 
into six groups and asked to rate a total of 
45 food stimuli for preference value using a 
nine-point scale. The 45 stimuli were ran- 
domly divided into individual lists containing 
10, 15, or 20 foods and each group received 
one of the possible orders of list length. In- 
dividual rater reliability and bias measures 
were computed for each list and group and 
these coefficients, after being normalized, 
were subjected to analyses of variance. Rater 
reliability was significantly different between 
the individual lists, but was not affected by 
either the length of the list or by the tem- 
poral order of the list in the series. Rater 
bias was unaffected by list, length, or trial 
variables. It was concluded that “judgmental 
fatigue” does not affect rater reliability or 
bias when the Ss report food preference self- 
ratings. 


Received January 13, 1955. 


References 


. Bendig, A. W. The reliability of self-ratings as 
a function of the amount of verbal anchoring 
and of the number of categories on the scale. 
J. appl. Psychol., 1953, 37, 38-41. 

. Bendig, A. W. Reliability and the number of 
rating scale categories. J. appl. Psycholl., 
1954, 38, 38-40. 

3. Bendig, A. W. Reliability of short rating scales 
and the heterogeneity of the rated stimuli. 
J. appl. Psychol., 1954, 38, 167-170. 

. Bendig, A. W. Rater reliability and the hetero- 
geneity of the scale anchors. J. appl. Psy- 
chol., 1955, 39, 37-39. 





454 A. W. Bendig 


5. Bendig, A. W., & Sprague, J. L. Rater experi- 10. Thurstone, L. L. An experiment in the predic- 
ence and the reliability of case history ratings tion of choice. Univer. Chicago Psychometr. 
of adjustment. J. consult. Psychol., 1954, 18, Lab. Rep., 1951, No. 68. 
207-211. . U. S. Dept. Army. AGO. PRB. A study of 
- Cummings, S. T. The clinician as judge: judg- officer methodology: III. Order of rating and 
ments of adjustment from Rorschach single- validity of rating. Personnel Res. Br. Rep., 
card performance. J. consult. Psychol., 1954, 1952, No. 902, 14 p. 
18, 243-247. 
7. Fisher, R. A. Statistical methods sd research J. appl. Psychol., 1943, 27, 288-298. 
workers. (10th Ed.) New York: Hafner, * —— 
1948. . West, Evelyn M. Test-retest reliability in rank- 
. Guilford, J. P. Psychometric methods. (Ast ing as a function of “esthetic exhaustion.” 
Ed.) New York: McGraw-Hill, 1936. Amer. Psychologist, 1952, 7, 332-333. (Ab- 
. Hamlin, R. M. The clinician as judge: implica- stract) 
tions of a series of studies. J. consult. Psy- . Woodworth, R. S. Experimental psychology. 
chol., 1954, 18, 233-238. New York: Holt, 1938. : 


. Wallen, R. Sex differences in food aversions. 





The Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


A Note on Alternative Methods for Estimating Factor 
Scores 


David K. Trites and Saul B. Sells 
USAF School of Aviation Medicine 


At the conclusion of factor analysis it is 
frequently desired to estimate individual 
standings on the functional entities extracted. 
There is general agreement among factor 
analysts that those variables most highly 
saturated on a given factor should provide 
the components for estimating that factor. 
At this point, however, there is a divergence 
of opinion as to how this may best be accom- 
plished. 

In general, the methods of estimating fac- 
tor scores involve some linear combination of 
the variables. The simplest and most com- 
mon method of weighting is to assign each 
variable (measured in standard scores) a 
weight of unity (1, 4) with the algebraic 
sign corresponding to the sign of the factor 
loading of the variable. 

All of the alternative methods involve der- 
ivation and use of fractional weights. Of 
these techniques the easiest and most direct 
utilizes a weight equal to the value of the 
factor loadings of the variables, or, if more 
than one factorization has been performed 
and essentially the same factorial structure 
obtained, the average of the loadings. 

Although arguments can be advanced in 
favor of both unit and fractional weights, it 
is sometimes desirable to compare alternative 
methods empirically. Such a comparison has 
been made using data developed by Cattell in 
a research contract‘ sponsored and monitored 
by the Department of Clinical Psychology of 
the USAF School of Aviation Medicine. 

In this research a battery of approximately 
104 paper-and-pencil and performance tests 
of personality variables was administered to 
1,000 student pilots as they entered the Air 
Force Flight Training Program at Greenville 
Air Force Base, Greenville, Mississippi, dur- 
ing 1951 and 1952. From the group of 1,000, 
two samples were selected, one of 500 sub- 


1 This research is being prepared for publication as 
a USAF School of Aviation Medicine project report. 


jects, the other of 250 subjects. The test 
variables for each sample were intercorrelated, 
factored, and rotated to simple structure, in- 
dependently (2). From the factors identi- 
fied in each analysis, six were considered as 
being clearly common to both. 

For estimation of each subject’s score on 
each factor, the two methods discussed previ- 
ously were used. In the first method the 
standard scores for each variable were com- 
bined with unit weight, and these sums taken 
as the unit weighted estimates of each sub- 
ject’s factor scores. In the second technique 
the average factor loading for each variable 
was utilized as the weight. To estimate a 
subject’s score for a particular factor, each 
of the component variables was multiplied 
by the appropriate weight. The sums of 
these products were designated fractionally 
weighted factor scores. 

Following the computation of unit and 
fractionally weighted factor scores for each 
subject, the correlation between the two sets 
of scores was obtained for each of the six 
factors. The correlations are presented in 
Table 1. 

Inspection of the coefficients and the 
squares of the coefficients indicates that there 
is little difference between the two sets of 
scores. Considering the discussion of corre- 
lation between two weighted sums presented 
by Gulliksen (3) and in view of the findings 
in item and regression analyses, such a high 
degree of correspondence between unit and 
fractionally weighted scores might have been 
anticipated. However, as Gulliksen, has in- 
dicated, correlation between the weighted 
sums could have been zero if either of the 
means of the weights for any factor had been 
zero. In this instance the use of negative 
loadings might have resulted in such correla- 
tions; but if the weights had been obtained 
from a rotation to a positive manifold, zero 
correlations would have been unlikely. 


455 





David K. Trites and Saul B. Sells 


Table 1 


Correlations of the Six Fractionally Weighted Factor 








989, .980 995 ‘ 972 
978 863 260; 931 945 
988 5 946 880 961 








Gulliksen (3) has also pointed out that 
when the standard deviation of the set of 
weighis is small and when the variables are 
highly intercorrelated, there is little likeli- 
hood of obtaining results with fractional 
weights differing markedly from those ob- 
tained with unit weights. Using factor load- 
ings as weights obviously precludes a large 
standard deviation in an absolute sense, and 
variables used in factor-score estimation are 


3. Gulliksen, H. 


usually highly loaded on the same factor, and 
consequently usually highly intercorrelated. 
It may be concluded, then, that in most 
instances there is little gained by use of frac- 
tional weights. Such is certainly the case 
in the present comparison. Unit weighting 


gives, for all practical purposes, the same or- 
der of scores, and from the standpoint of com- 
putation is much the simpler method. 


Received January 17, 1955. 


References 


. Cattell, R. B. 
Harper, 1952. 

. Cattell, R. B. Parallel proportional profiles and 
other principles for determining the choice of 
factors by rotation. Psychometrika, 1944, 9, 
267-283. 


Factor analysis. New York: 


Theory of 
York: Wiley, 1950. 

. Thurstone, L. L. Mutltiple-factor -analysis. 
cago: Univer. of Chicago Press, 1947. 


mental tests. New 


Chi- 





Tie Journal of Applied Psychology 
Vol. 39, No. 6, 1955 


Memory for Names and Faces: A Characteristic of Social 
Intelligence? 


Walter A. Kaess and Sam L. Witryol 


The University of Connecticut 


This experiment was designed to analyze 
factors which are related to performance on 
the Memory for Names and Faces subtest of 
the George Washington University Social In- 
telligence Test (7). One might, but prob- 
ably should not, say that the memory-for- 
names-and-faces type of test has high face 
validity. The basic experimental task re- 
quires the subject to associate a name with a 
face. The testee is asked to memorize the 
names of a number of portrait photographs. 
Later, the testee is given a larger group of 
photographs and requested to identify the pic- 
tures previously studied. Most testees and 
not a few employment managers assume that 
this type of test measures a useful social skill. 
Many individuals who successfully deal with 
large numbers of people are reputed to have 
phenomenal memories for names and faces. 
Presumably the ability to say, “Hello, Mr. 
Piccalilli, I believe we met eight years ago in 
the elevator at Radio City,” serves as an ego- 
inflating stimulus which tends to make Mr. 
Piccalilli more likely to buy an insurance 
policy or cast a vote for a particular candi- 
date. Presumably, also, to address Mr. Pic- 
calilli as Mr. Mustard will be taken as a di- 
rect assault upon his person, and the sale or 
the vote will be irretrievably lost. 

Tests requiring the association of a name 
with a face enjoy considerable popularity. 
Besides being included in the George Wash- 
ington scale, this type of test is included in 
the Factored Aptitude Series (3) which is 
reportedly used by many industrial organi- 
zations.. The manual of the latter states, 
“. . . the ability measured is broader than 
test content. Good memory for names and 
faces also means ability to recall other types 
of information.” Performance on the mem- 
ory-for-faces test is reported by the manual 
to be related to performance on such jobs as: 
agent, buyer, manager, receptionist, salesman, 
telephone operator, and waiter. 


The memory-for-faces type of test has a 
long history in psychology beginning as early 
as 1926 (4) when it was incorporated in the 
first form of the George Washington Univer- 
sity Social Intelligence Test. Successive re- 
visions of this scale continued to include this 
names-and-faces subtest, as does the present 
form (7). One of the authors of the scale, 
F. A. Moss, defined social intelligence as 

. the ability to get along with people.” 
He wrote (4, p. 26): “One of the most im- 
portant factors in social intelligence is the 
ability to recognize faces and remember 
names. The person who gets along best with 
others does not have to be introduced to a 
man three or four times before he remembers 
that he has met him before.” Probably the 
best critical summaries of early research on 
the total scale and the particular subtest un- 
der consideration here are contained in the 
reviews by Thorndike and Stein (11) and by 
Jackson (2). The major inferences drawn 
from this early research were that the test is 
of dubious validity and that a meaningful 
criterion of social intelligence had not been 
definitively isolated. 

Important questions remain to be answered 
concerning this type of memory test. It is 
not clear exactly what the test is measuring. 
Does it measure some aspect of social intelli- 
gence, general memory, general intelligence, or 
spatial ability? The sometimes reported dif- 
ferences between various occupational groups 
are not particularly convincing, since a num- 
ber of factors could account for the differ- 
ences. 

We have not attempted a direct assault 
upon the difficult problem of the validity of 
this type of test. Rather, we have studied 
aspects of test performance which should hold 
if the test is to have general utility. The 
first question studied relative to validity con- 
cerned the generality of the ability measured. 
Does the ability to form strong printed name- 


457 





Walter A. Kaess and Sam L. Witryol 


Table 1 


Order of Presentation for Names and Faces (GWU) and Miniature Life Situation (Interview) 








Order of Presentation 
GW U—Interview 
Interview Set—GWU 
Interview Non-set—GW U 


photograph associations correlate highly with 
the ability to form person-spoken name asso- 
ciations which are most common in everyday 
life? Our second question considered whether 
the instructions to the testee may not in 
themselves destroy the very thing the test 
wishes to measure. When we say, “Study 
these pictures carefully because ,” do 
we not establish a preparatory set in even 
the most self-centered individual so strong 
that his performance represents rote memory 
rather than some aspect of social memory? 
The third question regarding the existence 
and extent of sex differences was suggested 
from the previous literature and from early 
exploratory efforts in the present investiga- 
‘tion. 
Experimental Procedure 

The subjects in this investigation were 111 males 
and 99 females enrolled in nine laboratory sections 
of the Introductory Psychology course at the Uni- 
versity of Connecticut. The average lab section size 
was 23. Data collected from measures on students 
in this course on the Allport-Vernon Study of Values 
in recent years have been in very close agreement 
with the composite norms of American college stu- 
dents published in the manual. Similar representa- 
tive results have been obtained from scores on group 
tests of general intelligence. As defined by these 
measures and published norms, the population at 
this university appears to be quite representative of 
American college students. 

The Memory for Names and Faces subtest of the 
George Washington University Social Intelligence 
Test (GWU) and a names-and-faces miniature life 
situation were presented to all the lab sections in 
counterbalanced order. When the GWU subtest was 


Condition 


I 
I 
III 


presented first, the instructions served as a set for. 


this task and the miniature situation to follow (Con- 
dition I). In those sections where the miniature 
situation was presented first, instructional set was 
counterbalanced with no instructions or lack of set 
(Conditions II and III). Set groups were _ in- 
structed (Condition II): 


“This lab will be a demonstration of a test of so- 
cial intelligence. I shall give each of these ten peo- 


Lab Sections 

19, 22, 31) 
21, 25, 25) 
18, 22, 27) 


A, B, C (N's 
D, E, F (N's 
G, H, I (N’s = 


ple an alias, a false name. When I interview them 
later, in a sort of radio show of the sidewalk inter- 
view type, each will use oniy his alias, not his real 
name. 

“Listen carefully, for later on you will be asked 
to remember the alias used by each subject.” 


Table 1 demonstrates the manner in which this was 
done. Reference hereafter to experimental condi- 
tions will be made in terms of the labels employed 
in the table: (2) GWU for the Memory for Names 
and Faces subtest, (6) Interview for the miniature 
situation, (c) Interview Set for the miniature situa- 
tion with instructions, and (d) Interview Non-set 
for the miniature situation without instructions. 

As indicated earlier, on the GWU subtest the 
testee is asked to memorize the names under 12 pic- 
ture portraits of young men. Later the 12 pictures 
are presented again with 13 additional portraits 
From this matrix the testee is required to associate 
the correct one of four pictures presented in multi- 
ple-choice form with each of the 12 names. Each 
correct multiple-choice association scored one, 
with a total maximum score of 12. 

Five males and five females in each laboratory 
section were participants in the miniature social 
situations. Each was assigned an alias by the lab 
instructor who conducted a very brief “sidewalk in- 
terview” for about one minute. In every case it 
was made certain that the participant-subject spoke 
his alias clearly at least once. Later the students in 
the lab section were asked to match the correct 
name from the 10 aliases listed on the blackboard 
with the person who had adopted that name for the 
experiment. The participant-subjects were presented 
to the class in a prescribed random order to be 
identified from one of the names on the blackboard. 
Although the 10 participant-subjects were different 
for each of the nine lab sections, the 10 names used 
as aliases remained standard for all the groups. A 
score of one was credited for each correct matching. 
The total maximum score was 10. 

It should be noted again that two conditions (1 
and II) served as a set for the miniature social 
situation in six of the laboratory sections (see Table 
1). The instructions for the GWU subtest provided 
a set when the Interview task was administered 
second (Condition I). When the Interview was ad- 
ministered first, set and non-set groups were coun- 
terbalanced. Set groups were given instructions as 
indicated above (Condition II), while non-set groups 


is 





Memory for Names and Faces 


did not receive any kind of preparatory instructions 
(Condition ITI). 

The three conditions permitted some evaluation of 
the effect of instructions upon test and interview per- 
formance. If instructions produce marked changes 
in test scores, the question arises as to which pre- 
paratory set-inducing situation provides the most 
useful measure. The miniature situation provided 
two groups who experienced a more “lifelike” situa- 
tion than the paper-and-pencil test afforded. The 
students who observed the demonstration resembled 
those who heard but were not the direct recipients 
of social introductions. The students who were in- 
troduced to the class were perhaps one step closer 
to the more common situation of being an active 
participant in a social situation. If performance on 
the GWU subtest measures an ability of consider- 
able generality, it seems reasonable to expect fairly 
high correlations between the GWU and the minia- 
ture situations. 

Finally, the digit-span subtest from the Wechsler- 
Bellevue Intelligence Scale was administered to 76 
subjects in order to examine the relationship between 
memory, as defined by recalling digits, and social 
recall, as defined by the GWU and Interview tasks 
employed in this investigation. 


Results 


A detailed analysis in terms of experimental 
conditions and sex is shown in Table 2. The 
observer-participant classification is omitted 
from the present analysis. The means for 
observers and participants were 6.09 and 
6.15, respectively. Although parallel analy- 
ses for observer and participant classifications 
were made, the distinction contributed no ad- 


459 


ditional information, while the effects of the 
other variables remained in agreement with 
the other analyses. Except to illustrate spe- 
cial points, the following results are based on 
all subjects, NV = 210: 


1. Analysis of miniature social situation. 
Sex differences on the Interview tasks demon- 
strated female superiority and were consistent 
for all three conditions. However, there were 
no differences in task performance as a func- 
tion of the three experimental conditions of 
set. A double analysis of variance’ yielded 
an F of 23.02 (F .001 = 11.38) between sexes, 
while the mean square between conditions of 
set was smaller than the error term or the al- 
most nonexistent interaction. 

2. Analysis of memory for names and faces. 
Analysis of variance demonstrated female su- 
periority on the GWU subtest. The F be- 
tween, sexes was 12.37 (F .001 = 11.38). 
The njean squares for the three conditions of 
set, arid for the Sex and set interaction, were 
near the .05 level of confidence. These re- 
sults are not as clear-cut as the comparable 
analyses for the Interview tasks above. Sex 
differences on the GWU subtest appear real 
from the fact of statistical significance in two 
of the three conditions; the possible interac- 

1 Bartlett’s test for homogeneity of variance and 
Snedecor’s suggestions for analyzing data having dis- 
proportionate subgroups (9) were used for all the 
analyses of variance. 


Table 2 


Summary of Differences between Sexes within Conditions of Set for Interview and GWU 


Condition Sex 
I. GWU—Interview Males 
Females 
Both sexes 
LL. Interview Set—-GWU Males 
Females 
Both sexes 


III. Interview Non-set—GWU Males 
Females 


Both sexes 
* 05 level of confidence. 


** O01 level of confidence. 
*** 001 level of confidence 


Interview 


M SD 
2.39 
2.17 
2.27 


9.28 
9.55 
9.43 
9.73 


10.60 
10.10 


2.98** 


8.18 
10.31 
9.10 





460 


tion is regarded as the result of uncontrolled 
factors among the males in Condition ITI. 
The evidence for influence upon scores by the 
three experimental conditions of set is slight. 

3. Analysis of sex differences. The magni- 
tude of the sex differences on both tasks was 
surprising. For all conditions the female 
mean score was .67 sigma units above the 
male mean for the interview, and .49 sigma 
units above the male mean for the GWU sub- 
test. Only 24 per cent of the female Inter- 
view scores were below the male mean, and 
30 per cent of the female GWU scores were 
below the male mean on the latter task. Bi- 
serial r’s between sex and Interview and be- 
tween sex and GWU were .418 and .307, re- 
spectively (N = 210); these were calculated 
as an illustrative device for emphasizing the 
magnitude of the sex differences. 

4. Relationships between tasks. The Pear- 
son product-moment correlation obtained be- 
tween the Interview and GWU tasks was .315 
(p = .001) for the 210 subjects in the total 
population. This coefficient is somewhat in- 
flated as a result of combining heterogeneous 
groups. Thus, when the population was di- 


vided into homogeneous sex groupings, the 
correlation coefficient dropped to .273 for the 
111 males and .250 for the 99 females, both 


coefficients significant at the .01 level. The 
correlations within conditions were: .237 when 
the GWU served as a set (Condition I), .276 
with Interview set instructions (Condition 
II), and .451 with no set (Condition III). 
The tendency indicated in this sequence for 
the relationships to increase as the strength 
of preparatory set decreases is suggestive but 
not statistically significant. 

5. Digit recall relationships and sex differ- 
ences. The Wechsler digit-span subtest mean 
for 39 males was 11.62; and for 37 females 
it was 12.14, with SD’s of 1.93 and 1.90, re- 
spectively (¢ = 1.18). Four product-moment 
correlations calculated for each sex between 
the digit-recall test and each of the two so- 
cial memory tasks were approximately zero: 
for the males, — .10 and — .09 between digit 
recall and Interview and GWU subtest, re- 
spectively; for the females, — .13 and — .05. 
Thus, ability to remember names and faces 
does not appear to be related to the ability 


Walter A. Kaess and Sam L. Witryol 


to recall another type of information as de- 
fined by the digit-span test, nor are the sex 
differences on the social recall tasks reflected 
in similar differences on the digit recall task. 


Discussion and Conclusions 


The correlations between the miniature so- 
cial situation and the GWU names-and-faces 
tasks may be interpreted as an important in- 
dication of the generality of the latter scale. 
The magnitude of the relationships, though 
small, are of high statistical significance. 
Thus, there appears to be some generality in 
the ability measured by the GWU names- 
and-faces subtest. Possibly the magnitudes 
of these correlations would be raised if the 
reliability of the interviews were increased 
by increasing the number of the participant- 
subjects (lengthening the test), or by care- 
ful rehearsal on the part of the experimenter 
in each lab section. However, this is not 
vital to the problem investigated, because the 
aim of the present study was to evaluate the 
generality of the GWU subtest rather than 
to develop a parallel form for it. It is doubt- 
ful that the relatively low relationships found 
between the GWU and the rather uniform in- 
terviews would be higher with “reality” cri- 
teria where innumerable factors are uncon- 
trolled. It appears, then, that a memory-for- 
faces type of test possibly measures a useful 
social skill, but lacks in its present form suffi- 
cient generality to have widespread practical 
application. 

The magnitudes of the sex differences on 
both experimental tasks suggests a statistical 
contamination not always clarified in research 
on social intelligence and social perception. 
If females are superior to males on several 
tasks, any correlations computed on these 
measures will be spuriously inflated when 
groups of mixed sex are studied. Exemplary 
is the attenuation of the .315 coefficient of 
correlation between the two names-and-faces 
tests for the mixed population of 210. When 
the same relationships were calculated for 
each of the sexes, the coefficients dropped to 
.273 and .250 for males and females, respec- 
tively. Without this logical correction it is 
possible that scores on the Interview and 
GWU tasks might be related to a test of 





Memory for Names and Faces 


verbal ability because females tend to be su- 
perior on this latter task also. In some situa- 
tions factor analyses of mixed sex populations 
could result in factor loadings highly con- 
taminated by sex differences. Conceivably, 
then, one could indicate a clear-cut factor, 
employ tests with high loadings to name the 
factor, and yet completely overlook the pos- 
sibility of measuring the factor by observing 
whether the testee wears lipstick! The com- 
bination of heterogeneous groups can yield 
spurious relationships leading to serious mis- 
conceptions in interpretation. 

It is difficult to account for the findings in 
the present investigation regarding the female 
superiority on both tasks. Early research on 
the GWU subtest indicated no sex differences 
(5, 1, 12), although sex differences for the 
total score and for other subtests of the scale 
were found. Some evidence reporting social 
intelligence to have a fairly high loading on 
the verbal factor (10) may be suggestive. 
On the other hand, Woodrow, who controlled 
for sex differences, found the GWU names- 
and-faces test to have highest loadings on a 
spatial factor (12). Furthermore, factors 
such as intelligence and digit-span memory 
are unlikely to account for the pronounced 
sex differences obtained here. Finally, un- 
published research on the population em- 
ployed in this study did not reveal signifi- 
cant sex differences in motivation and interest. 

Two alternative and speculative explana- 
tions are offered. Most of the previous re- 
search has been reported on the old Revised 
Form, First Edition of the Social Intelligence 
Test (6). In this form, pictures are of males, 
30 ta 50 years of age, dressed in the archaic 
style appropriate to the 1920’s. By mod- 
ern standards some of the picture-portraits 
might be colloquially termed “characters’’ or 
“creeps.” The most recent form of the test 
(7) is composed of pictures of young men in 
the 18 to 20 years range. Each looks like 
an excellent model for a shirt ad in a teen- 
age magazine. Perhaps the change increased 
the feminine interest. The second and more 
speculative suggestion is based upon Ries- 
man’s concept of the inner- and outer-directed 
personalities (8). If the hypothesis is true 
that there has been a general drift toward 


461 


outer direction in our time, it is also conceiv- 
able that this modification of character has 
been relatively more accelerated in females. 
This tentative interpretation stems from the 
fact that most of the research on the GWU 
scale was conducted in the late ’20’s and 
early °30’s. Although both explanations are 
speculative, the consequences are testable and 
may be worth further research. 


Summary 


The Memory for Names and Faces picture 
subtest of the George Washington University 
Social Intelligence Test and a miniature so- 
cial situation testing the association of spoken 
names with human subjects in a simulated life 
setting were administered to 210 students in 
nine Introductory Psychology laboratory sec- 
tions at the University of Connecticut. Three 
conditions of set and non-set were also intro- 
duced to evaluate the consequences of pre- 
paratory instructions upon the social recall 
task performances. Finally, the digit-recall 
subtest of the Wechsler-Bellevue scale was 
administered to 76 subjects in order to ex- 


plore the similarity between this type of 
memory function and the two social recall 


tasks. The evidence from the results in this 
experiment appear to warrant the following 
conclusions: 


1. The relationships between the social re- 
call tasks are small (about .30) but statisti- 
cally significant, reflecting some generality; 
widespread application is seriously limited. 

2. Sex differences favoring females on both 
social recall tasks are highly significant and 
contradictory to some early research report- 
ing no differences. 

3. Set, as defined by preparatory instruc- 
tions for the two tasks, does not significantly 
influence performance. 

4. The social recall tasks are not related to 
one type of memory, as measured by the 
group administration of the Wechsler-Belle- 
vue digit-span subtest. 

Two speculative and possibly testable hy- 
potheses were offered to account for some of 
the findings. 


Received November 12, 1954 





Walter A. Kaess and Sam L. Witryol 


References 


1. Hunt, Thelma. The measurement of social in- 


telligence. J. appl. Psychol., 1928, 12, 317- 
333. 

. Jackson, V. D. The measurement of social pro- 
ficiency. J. exp. Educ., 1940, 8, 422-474. 

. King, J. E. Factored aptitude series. Chicago 
(105 W. Adams St.): Industrial Psychology, 
1947. 

. Moss, F. A. Do you know how to get along 
with people? Why some people get ahead in 
the world while others do not. Sci. Amer., 
1926, 135, 26-27. 

. Moss, F. A., & Hunt, Thelma. Are you socially 
intelligent? An analysis of the scores of 7000 
persons on the George Washington University 
Social Intelligence Test. Sci. Amer., 1927, 
137, 108-110. 

. Moss, F. A., Hunt, Thelma, & Omwake, Kath- 
erine T. Social Intelligence Test (Rev. Form, 


Ist Ed.). Washington, D. C.: George Wash- 
ington Univer., 1930. 


7. Moss, F. A., Hunt, Thelma, Omwake, Katherine 


T., & Woodward, L. G. Social Intelligence 
Test (Rev. Form, 2nd Ed.). Washington, 
D. C.: George Washington Univer., 1949. 


. Riesman, D. The lonely crowd; a study of the 


changing American character. New Haven: 
Yale Univer. Press, 1950. 


. Snedecor, G. W. Statistical methods. Ames, 


Iowa: Iowa State Coll. Press, 1946. 


. Thorndike, R. L. Factor analysis of social and 


abstract intelligence. J. educ. Psychol., 1936, 
27, 231-233. 


. Thorndike, R. L., & Stein, S. An evaluation of 


the attempts to measure social intelligence. 
Psychol. Bull., 1937, 34, 275-285. 


. Woodrow, H. The common factors in fifty- 


two mental tests. Psychometrika, 1939, 4, 
99-108. 





Book Reviews 


Schramm, Wilbur (Ed.). The process and 
effects of mass communication. Urbana: 
Univer. of Illinois Press, 1954. Pp. 586. 
$6.00. 


This is the third collection of readings in 
mass communications research edited by Pro- 
fessor Schramm and published by the Uni- 
versity of Illinois Press. Unlike its predeces- 
sors (Communications in Modern Society, 
1948, and Mass Communications, 1949), this 
volume is strongly oriented toward i a- 
tional mass communication. It was pr ed 
initially, in fact, as a collection of background 
readings for new employees of the United 
States Information Agency who might have 
eventual research or evaluation duties. Sub- 
sequently, the editor thought the collection 
would be useful as a text for nongovern- 
mental groups and it was published in book 
form. It does not attempt to cover, except 
incidentally, material on communication con- 
tent analysis or material on communication 
agencies or the institutionalized media. 
Nearly two-thirds of its 38 items appeared 
initially as chapters or sections in books, 
while the remainder are drawn from journal 
articles or survey releases. 

Although Editor Schramm is keenly aware 
of the importance for international mass com- 
munication of area knowledge and the study 
of cultures, he nevertheless believes that much 
of the background necessary to understand its 
problems is the same as that required for 
an intelligent approach to other kinds of so- 
cial communication. This view doubtless un- 
derlies his roughly equivalent selection of 
content for the book from domestic mass 
communication studies (far removed from 
international concerns and apparently in- 
cluded for their illustrative value, i.e., Wolfe 
and Fiske’s “Why They Read Comics”), 
essay-type items on communication and opin- 
ion concepts (such as Blumer’s “The Crowd, 
The Public and the Mass”), and items di- 
rectly and specifically concerned with inter- 
national communications (such as Davison 
and George’s “Outline for the Study of In- 
ternational Political Communication”). 

In terms of content allocation, according to 


463 


the two main divisions suggested by the 
book’s title, the body of material bearing di- 
rectly on the process of mass communication 
is smallest. Much of the expository load on 
this topic is, in fact, carried by one of 
Schramm’s own essays. This item, the first 
piece in the volume, makes considerable use 


_of information theory analogies and of Os- 


good’s representational mediation process 
schema. It presents the principal concerns 
of researchers on the mass communication 
process and shows the relative generality of 
process phenomena over nations and cultures. 
The larger share of the book’s items concern, 
directly or indirectly, mass communication 
effects. These range from the “primary ef- 
fect”—the securing of attention—to highly 
ramified considerations involved in modify- 
ing attitudes across national and cultural 
boundaries. Many of these items exemplify 
process concerns in specific research settings. 
Schramm has provided an introductory essay 
to each of the main sections of the book and 
these are of considerable aid in maintaining 
continuity. 


A brief analysis of the “demography” of 
the book’s items shows that the Public Opin- 
ion Quarterly was the original source of by 
far the largest number of journal articles— 


eight. No other journal was the source for 
more than one. Nearly 90 per cent of the 
items were written in the post-World War II 
era and may be regarded as of recent vintage. 
Only two prewar items were chosen. The 
author with the most items in the collec- 
tion (three) is Leonard Doob. Carl Hov- 
land, Bernard Berelson, Joseph Klapper, and 
Schramm each have two items. Contribu- 
tions from psychologists and sociologists have 
been chosen about equally and are well in 
excess of selections authored by practitioners 
of other disciplines, although political science 
is well represented. 

As a background volume or supplementary 
text this is a useful book. Many of the items 
have not been reprinted previously in the siz- 
able collection of readers now available in 
this field. The selections are apt in view of 
the editor’s intent and are particularly well 
chosen in the final section on problems of 





464 


achieving an effect with international com- 
munication. An appendix suggests a supple- 
mental list of 100 additional titles. The re- 
viewer noted most favorably that one of the 
book’s important papers—the Davison-George 
item—was reworked by the authors for this 
volume. Such a procedure, more generally 
applied in collections of readings, would up- 
date and doubtless improve certain “classic” 
papers whose value may be diminished by 
obsolescence of data, by new insights or in- 
tegrations, or by significant changes in the 
field which the item covers. 
Robert L. Jones 


University of Minnesota 


Bush, Robert R., and Mosteller, Frederick. 
Stochastic models for learning. New York: 
Wiley, 1955. Pp. 365. $9.00. 


This contribution to learning theory ana- 
lyzes the results of many learning experi- 
ments in terms of a probabilistic hypothesis 
after setting up a general theoretical model 
from which specific models to fit particular 
results can be derived. Part I The Mathe- 
matical System and the General Model ex- 
pounds the mathematical properties of the 
model and considers a number of special 
cases. Part II Applications applies the model 
to many specific experimental problems in 
learning on both humans and animals (free- 
recall verbal learning, avoidance training, 
imitation, symmetric choice problems and 
runaway experiments) and gives much atten- 
tion to the statistical problems of estimating 
model parameters and measuring goodness 
of fit. 

The basic principle from which the model 
is developed is that events and the responses 
to those events have certain probabilities and 
that reinforcements, which make responses 
more likely, occur. Knowing the initial prob- 
abilities and the frequencies of the reinforce- 
ments, operators can be worked out from 
which functions can be derived which con- 
dense and generalize the empirical data from 
a large number of experiments. Two assump- 
tions are made for the general model: first, 
that the set of probabilities after an event has 
occurred depends only upon the set of prob- 
abilities just prior to the event and upon an 
operator associated with the event, and sec- 
ond, that the operators are linear. 


Book Reviews 


Behavior is then a statistical phenomenon 
in which mathematical operators correspond 
to the events which alter response tendencies 
during learning. Since irreversible changes 
which make repeated sampling virtually im- 
possible occur (“stochastic” emphasizes the 
temporal nature of the probability problems 
considered), much of the statistics used so 
effectively with other psychological problems 
is inapplicable to learning. The authors meet 
this deficiency by developing specific statisti- 
cal procedures for the analysis of learning 
datgthat promise much in the way of gain in 
efficiency and meaningfulness. 

While this book is not an applied book ex- 
cept in the sense that mathematical tech- 
niques of much power are applied to basic 
psychological problems, it represents a dis- 
tinct advance in the scientific analysis of 
learning data. Ultimately applied psychol- 
ogy may benefit from the methods so de- 
veloped. 

John E. Anderson 


University of Minnesota 


Mandell, M. M. A company guide to the se- 
lection of salesmen. New York: American 
Management Association, 330 West 42nd 
Street, Research Report No. 24, 1955. Pp. 
161. $4.75 ($3.50 to AMA members). 


This report discusses current practices of 
180 manufacturing firms in selecting sales- 
men. These consist of application blanks, 
interviewing methods, tests and measure- 
ments, and reference inquiries. In addition, 
the author outlines and discusses job analy- 
sis, recruitment, organizing and administer- 
ing a total sales selection program, and pre- 
sents 50 pages of facsimile reproductions of 
forms used by various companies. 

The report is, in effect, a manual for sales 
managers and relatively untrained sales per- 
sonnel workers. Research evidence is con- 
spicuous by its absence. The personnel psy- 
chologist, however, will be pleased to note 
the repeated pleas made by Mandell for sales 
organizations to base their selection programs 
on the results of personnel research. It is to 
be hoped that these pleas will be heeded by 
an increased number of sales managers. 


Donald G. Paterson 


_ University of Minnesota 








db FREE REVIEW COPY of 


Personnel Administration ° 
To Readers of the Journal of Applied Psychology 


This is the bi-monthly journal of the Society for 
Personnel Administration. It is devoted to the 


Single Copy - $1.00 publication of professional articles of current interest 
Annual Y ’ in the fields of personnel, applied psychology, human 
subagane wae relations, and related professional subjects. 


Fer your FREE review copy, write now to 
SOCIETY FOR PERSONNEL ADMINISTRATION 
5506 Connecticut Ave.,N.W. Washington 15, D.C. 

















Hailed as an important contribution 


to the field... 


PSYCHOLOGY OF EXCEPTIONAL CHILDREN 


AND YOUTH 


By WILLIAM M. CRUICKSHANK (editor) and co-authors: DANIEL C. BROIDA, 
EMORY L. COWEN, JON EISENSON, BERTHOLD LOWENFELD, LEE MEYERSON, 
T. ERNEST NEWLAND, JOSEPH NEWMAN, SEYMOUR B. SARASON, and RUTH 


STRANG 


This book is unique in both the variety of 
“exceptionals” covered and in the life-span 
view of their adjustment problems. 

A symposium on psychological factors that 
influence the effect of atypical physiques and of 
variant mentalities upon the normative growth 
and development of children and young people, 
this new text contains over 1,000 research refer- 
ences by specialists in the various areas. 


research in terms of the impact of the variation 
upon the psycho-social adjustment of the be- 
havior. Includes all major groups of physically 
disabled and the intellectually superior as well 
as the intellectually inferior child as differing 
uniquely from the general child population. 


594 pages - 544”28%" + 1955 


Adopted at University of Miami, University of 
California (L. A.), Penn. State University .. . 


INTRODUCTION TO PSYCHOPATHOLOGY, 2nd Ed. 


By LAWRENCE I. O’KELLY and FREDERICK A. MUCKLER, University of Illinois 


An up-to-date, clarified and reorganized study 
of the basic phenomena of disordered behavior. 
Re-writing and pruning throughout have in- 


principle of adaptation is related to concepts 
of general psychology. Elsewhere important 
new material has been substituted for obsolete 
material and chapter division and arrangement 
have been improved. 


704 pages + 5144”28%" + 1955 


unrile 





